Skip to content

Conversation

@speaking-in-code
Copy link
Contributor

Description

Fixes #24703.

Flutter driver has problems when flutter apps use more than a single isolate. The other isolates don't start, so apps malfunction. This PR fixes the problem by listening for new isolates to become runnable, and automatically starting them.

There was significant discussion of a prior version of this fix in #63167. I wrecked my git change history with a bad merge, so this is a cleaned up version of the PR.

Related Issues

#24703.

Tests

I added the following tests:

I added the following tests:

  • Added unit tests in flutter_driver_test.dart to verify that all isolates have resume() invoked.
  • Ran devicelab tests for the flutter gallery app, verifying that the bug that caused this change to be reverted last time (Revert "Make sure all isolates start during flutter driver tests." #62239) doesn't reoccur.
  • Added a new example app that uses isolates, and tested it on several platfoms: macos, android simulator and physical devices, ios simulator and physical devices.
  • Manually tested against an app that uses isolates started from native plugins.

Checklist

Before you create this PR, confirm that it meets all requirements listed below by checking the relevant checkboxes ([x]). This will ensure a smooth and quick review process.

  • I read the [Contributor Guide] and followed the process outlined there for submitting PRs.
  • I signed the [CLA].
  • I read and followed the [Flutter Style Guide], including [Features we expect every widget to implement].
  • I read the [Tree Hygiene] wiki page, which explains my responsibilities.
  • I updated/added relevant documentation (doc comments with ///).
  • All existing and new tests are passing.
  • The analyzer (flutter analyze --flutter-repo) does not report any problems on my PR.
  • I am willing to follow-up on review comments in a timely manner.

Breaking Change

Did any tests fail when you ran them? Please read [Handling breaking changes].

  • No, no existing tests failed, so this is not a breaking change.
  • Yes, this is a breaking change.

@flutter-dashboard flutter-dashboard bot added a: tests "flutter test", flutter_test, or one of our tests d: examples Sample code and demos framework flutter/packages/flutter repository. See also f: labels. c: contributor-productivity Team-specific productivity, code health, technical debt. labels Aug 24, 2020
@speaking-in-code speaking-in-code force-pushed the master branch 5 times, most recently from 07359bb to 54e136e Compare August 24, 2020 05:16
@guidezpl
Copy link
Member

Thanks for your work, looks great! Only thing is I don't think the examples directory is the best place, github.com/flutter/tests may be it.

For the VMNoneEvent issue, have you seen https://github.com/flutter/flutter/blob/54e136e70dbfd0810c5fe24b841e9f015e6795cd/packages/flutter_driver/lib/src/driver/vmservice_driver.dart#L106?

@guidezpl
Copy link
Member

guidezpl commented Aug 24, 2020

Internal testing is much happier with this change, only 1 failed test. A hot reload change on Android that crashes during tearDownAll with driver.close().

Root cause may be #26953 (cc @DanTup)

Relevant logs:

00:21 +0: can hot reload the app                                                                                                                                                                       
00:22 +0: can hot reload the app                                                                                                                                                                       
00:23 +0: can hot reload the app                                                                                                                                                                       
VMServiceFlutterDriver: Connecting to Flutter application at http://[::1]:42893/Ejuscjt8zzM=/
VMServiceFlutterDriver: Isolate found with number: 2859452749925111
VMServiceFlutterDriver: Isolate is not paused. Assuming application is ready.
VMServiceFlutterDriver: Connected to Flutter application.

00:24 +0: can hot reload the app                                                                                                                                                                       
00:25 +0: can hot reload the app                                                                                                                                                                       
00:26 +0: can hot reload the app                                                                                                                                                                       
VMServiceFlutterDriver: New runnable isolate 4359891890788315

00:26 +1: can hot reload the app                                                                                                                                                                       
00:26 +1: (tearDownAll)                                                                                                                                                                                
00:26 +0 -1: can hot reload the app [E]                                                                                                                                                                
  Bad state: The client closed with pending request "getIsolate".
  package:json_rpc_2/src/client.dart 70:24                                              new Client.withoutJson.<fn>
  ===== asynchronous gap ===========================
  dart:async                                                                            _Completer.completeError
  package:json_rpc_2/src/client.dart 67:27                                              new Client.withoutJson.<fn>
  ===== asynchronous gap ===========================
  dart:async                                                                            Future.whenComplete
  package:json_rpc_2/src/client.dart 65:10                                              new Client.withoutJson
  package:json_rpc_2/src/peer.dart 90:22                                                new Peer.withoutJson
  package:vm_service_client/vm_service_client.dart 133:33                               new VMServiceClient.withoutJson
  package:vm_service_client/vm_service_client.dart 125:11                               new VMServiceClient
  package:flutter_driver/src/driver/vmservice_driver.dart 652:9                         _waitAndConnect
  ===== asynchronous gap ===========================
  dart:async                                                                            _asyncThenWrapperHelper
  package:flutter_driver/src/driver/vmservice_driver.dart 92:39                         VMServiceFlutterDriver.connect
  package:flutter_driver/src/driver/driver.dart 156:35                                  FlutterDriver.connect
  google3:///mobile/flutter/tests/app/test/hot_reload_with_code_change_test.dart 81:34  main.<fn>
  ===== asynchronous gap ===========================
  dart:async                                                                            _asyncThenWrapperHelper
  google3:///mobile/flutter/tests/app/test/hot_reload_with_code_change_test.dart        main.<fn>
  package:mobile.flutter.testing/runner_tests.dart 114:15                               testWithParameters.<fn>
  ===== asynchronous gap ===========================
  dart:async                                                                            _asyncThenWrapperHelper
  package:mobile.flutter.testing/runner_tests.dart                                      testWithParameters.<fn>

Interestingly, adding a tiny delay caused the test to pass.

00:23 +0: can hot reload the app                                                                                                                                                                       
VMServiceFlutterDriver: Connecting to Flutter application at http://[::1]:47179/CjIgblAQO7I=/
VMServiceFlutterDriver: Isolate found with number: 2172508940739699

00:23 +0: can hot reload the app                                                                                                                                                                       
Instance of 'VM'
VMServiceFlutterDriver: Isolate is not paused. Assuming application is ready.
VMServiceFlutterDriver: Connected to Flutter application.
VMServiceFlutterDriver: New runnable isolate 1579049678870519

00:26 +1: can hot reload the app                                                                                                                                                                       
00:26 +1: (tearDownAll)                                                                                                                                                                                
VMServiceFlutterDriver: Isolate is not paused. Assuming application is ready.

00:26 +1: All tests passed!                                                                                                                                                                            
Cleaning up...

@DanTup
Copy link
Contributor

DanTup commented Aug 24, 2020

I don't have any visibility of internal code so I can't offer much insight, but it seems like the json_rpc_2 package throws if the client closes the connection with outstanding requests. If a small delay fixes it, I'd guess whatever is triggering the getIsolate request isn't being awaited (perhaps because the test doesn't care about it), so awaiting it might be a nicer/more reliable fix than a delay?

(I'm not sure if it's related to #26953 though - the extra isolates in that issue are weird, but I wouldn't expect them to affect getting a response to getIsolate - especially if a small delay fixes it)

@speaking-in-code speaking-in-code force-pushed the master branch 2 times, most recently from 99c0872 to 9ee9bfe Compare August 31, 2020 00:40
@speaking-in-code
Copy link
Contributor Author

Remembering to cancel the subscription fixed the bug.

@speaking-in-code
Copy link
Contributor Author

The presubmit checks look OK, let me know if you think there is more for me to do on those. The windows failure failure looks unrelated to my change, I'm guessing that's a problem at head, not something new with this pull request.

@liyuqian liyuqian requested review from dnfield and guidezpl August 31, 2020 21:56
@liyuqian
Copy link
Contributor

I'm rerunning "hostonly_devicelab_tests-2-windows" to see if the failure is just a flaky test.

@speaking-in-code
Copy link
Contributor Author

Woohoo! Just waiting for the flutter build to be fixed now.

Is someone willing/able to tag this with waiting for tree to go green?

@guidezpl
Copy link
Member

guidezpl commented Sep 2, 2020

It's really unfortunate that that test still isn't passing. @dnfield I've looked in this and can't determine the root cause. Could you please have a look? (sponge2/d7d06397-8953-429d-a446-fcc58ab25236)

@speaking-in-code
Copy link
Contributor Author

Can someone send me a log for the failing test? (Or is it new_gallery_transition_perf/new_gallery_ios__transition_perf? I can run those directly.)

If the error is still of the variety Bad state: The client closed with pending request "ext.flutter.driver", perhaps we should catch exceptions as the VM service client closes?

@dnfield
Copy link
Contributor

dnfield commented Sep 2, 2020

There seem to be two different errors.

Type A:

Bad state: The client closed with pending request "getIsolate".
package:json_rpc_2/src/client.dart 70:24                                              new Client.withoutJson.<fn>
===== asynchronous gap ===========================
dart:async                                                                            _Completer.completeError
package:json_rpc_2/src/client.dart 67:27                                              new Client.withoutJson.<fn>
===== asynchronous gap ===========================
dart:async                                                                            Future.whenComplete
package:json_rpc_2/src/client.dart 65:10                                              new Client.withoutJson
package:json_rpc_2/src/peer.dart 90:22                                                new Peer.withoutJson
package:vm_service_client/vm_service_client.dart 133:33                               new VMServiceClient.withoutJson
package:vm_service_client/vm_service_client.dart 125:11                               new VMServiceClient
package:flutter_driver/src/driver/vmservice_driver.dart 654:9                         _waitAndConnect
===== asynchronous gap ===========================
dart:async                                                                            _asyncThenWrapperHelper
package:flutter_driver/src/driver/vmservice_driver.dart 109:35                        VMServiceFlutterDriver.connect
package:flutter_driver/src/driver/driver.dart 156:35                                  FlutterDriver.connect

Type B:

Unexpected <collected> sentinel.
package:vm_service_client/src/isolate.dart 244:7                                  VMIsolateRef.load
===== asynchronous gap ===========================
dart:async                                                                        _asyncErrorWrapperHelper
package:flutter_driver/src/driver/vmservice_driver.dart                           VMServiceFlutterDriver._initIsolateHandler.<fn>
dart:async                                                                        _EventSinkWrapper.add
package:vm_service_client/vm_service_client.dart 147:12                           new VMServiceClient._.<fn>
dart:async                                                                        _BroadcastStreamController.add
package:vm_service_client/src/stream_manager.dart 69:30                           new StreamManager.<fn>
package:json_rpc_2/src/server.dart 212:30                                         Server._handleSingleRequest
package:json_rpc_2/src/server.dart 188:24                                         Server._handleRequest
dart:async                                                                        _StreamController.add
package:json_rpc_2/src/peer.dart 128:36                                           Peer.listen.<fn>
===== asynchronous gap ===========================
dart:async                                                                        _BoundSinkStream.listen
package:flutter_driver/src/driver/vmservice_driver.dart 47:44                     VMServiceFlutterDriver._initIsolateHandler
package:flutter_driver/src/driver/vmservice_driver.dart 39:32                     new VMServiceFlutterDriver.connectedTo
package:flutter_driver/src/driver/vmservice_driver.dart 136:66                    VMServiceFlutterDriver.connect
===== asynchronous gap ===========================
dart:async                                                                        _asyncThenWrapperHelper
package:flutter_driver/src/driver/driver.dart 156:35                              FlutterDriver.connect

These are happening in the host script - the logs on the emulators aren't showing any obvious errors to me.

@dnfield
Copy link
Contributor

dnfield commented Sep 2, 2020

One red flag here is that driver is still using package:vm_service_client instead of package:vm_service. It's possible there's some bug in vm_service_client, which is discontinued.

@speaking-in-code
Copy link
Contributor Author

Interesting. Two different errors, both happening while establishing the VM connection. I have not seen these at all when testing locally.

Any additional information available about what these test cases are trying to test for? Is this a hot restart/reload during the initial setup? I can probably recreate that condition locally if I add sleep() statements in enough places. =)

I did notice that the vm service client library has some deprecation warnings. Anybody have thoughts on how much work it would be to replace it with the modern version...?

@speaking-in-code
Copy link
Contributor Author

The type B error makes sense to me and is fixable. There's a race condition between the isolate becoming runnable and the VM canceling/removing the isolate for some other reason. The isolateRef.load() call can fail, and that's harmless. Catching the exception would fix it.

The type A error is still confusing. Any ideas about what could cause that would be helpful, I suspect I'm misreading the code flow.

@guidezpl
Copy link
Member

guidezpl commented Sep 3, 2020

Context for the type A error:

The test is a wrapper/host for a counter app that auto-increments itself. The runner is sent the letters 'r' and 'R' to trigger a hot reload and hot restart, respectively. From the video recording, the test crashes sometime after a successful hot restart, possibly during tearDown

@speaking-in-code
Copy link
Contributor Author

I was able to reproduce this by adding a sleep statement towards the end of a test case, and then repeatedly restarting the app.

Somewhat different root causes for A and B.

The uncollected sentinel error happens if the onIsolateRunnable() callback produces a new isolate, but the isolate is destroyed before the call to isolateRef.load(). Fix is to handle that exception gracefully.

The pending call error is more subtle, but making sure to wait for the results of _resumeLeniently in the onIsolateRunnable callback seems to have fixed it. I think the issue here was that stream cancellation waits for the pending callback to finish. Before: stream cancellation wrapped up early, then the vm service client close would interrupt the pending call. Now: stream cancellation waits for the pending callback to finish.

Finally, there is still a race condition during driver connection: if the VM is restarted while we're still loading the first isolate in connect(), things go badly. That's an existing problem, and it's unclear what correct behavior would be. Maybe a retry? I'm not inclined to fix it in this pull request, but let me know if you want me to go for it.

@guidezpl
Copy link
Member

guidezpl commented Sep 7, 2020

Nice! We're also updating the internal tests to use package:vm_service instead of package:vm_service_client. I'll let @dnfield answer that last question.

@dnfield
Copy link
Contributor

dnfield commented Sep 8, 2020

I'm not sure what the infra failure here is about - @godofredoc or @digiter may know.

@guidezpl - do these changes now pass internally? Switching to package:vm_service will likely require changes upstream in flutter/driver.

@digiter
Copy link
Contributor

digiter commented Sep 8, 2020

We have an update on the builders, please rebase to the HEAD of master and retry tests.

@speaking-in-code
Copy link
Contributor Author

testonly_devicelab_tests is still timing out after 3 hours, with no debug output that I've been able to find.

Any tips on debugging this? Or on reproducing the problem locally?

@godofredoc
Copy link
Contributor

testonly_devicelab_tests is still timing out after 3 hours, with no debug output that I've been able to find.

Any tips on debugging this? Or on reproducing the problem locally?

Please rebase and upload a new commit. Those tests were removed like a week ago.

@speaking-in-code
Copy link
Contributor Author

Weird, I thought I rebased yesterday, but I still had the failing ancient tests. Just rebased again, and this time it seems to have gone well.

Crossing my fingers that the tests pass!

@fluttergithubbot fluttergithubbot merged commit ccd4f6d into flutter:master Sep 10, 2020
renyou added a commit that referenced this pull request Sep 11, 2020
renyou added a commit that referenced this pull request Sep 11, 2020
renyou added a commit to renyou/flutter that referenced this pull request Sep 11, 2020
renyou added a commit that referenced this pull request Sep 11, 2020
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 10, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

a: tests "flutter test", flutter_test, or one of our tests c: contributor-productivity Team-specific productivity, code health, technical debt. d: examples Sample code and demos framework flutter/packages/flutter repository. See also f: labels.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

flutter_driver pauses all isolates at their startup (even ones started from compute()), rather than only pausing initial isolate

9 participants