Skip to content

Conversation

@gaaclarke
Copy link
Member

@gaaclarke gaaclarke commented Sep 18, 2020

Description

HttpClientRequest.close is randomly hanging then throwing a generic exception after 4 minutes. I've filed a bug against dart and introduced a timeout in flutter since the request that would fail to converge on close would succeed on the second attempt in my testing.

I was able to validate with local testing that this addresses the issue and the 500ms timeout is well above my observed time for close to succeed which was more like 10ms in my observations.

cc @jmagman @xster

Related Issues

#63869
Fixes #65402

Tests

I added the following tests:

none

Checklist

Before you create this PR, confirm that it meets all requirements listed below by checking the relevant checkboxes ([x]). This will ensure a smooth and quick review process.

  • I read the Contributor Guide and followed the process outlined there for submitting PRs.
  • I signed the CLA.
  • I read and followed the Flutter Style Guide, including Features we expect every widget to implement.
  • I read the Tree Hygiene wiki page, which explains my responsibilities.
  • I updated/added relevant documentation (doc comments with ///).
  • All existing and new tests are passing.
  • The analyzer (flutter analyze --flutter-repo) does not report any problems on my PR.
  • I am willing to follow-up on review comments in a timely manner.

Breaking Change

Did any tests fail when you ran them? Please read Handling breaking changes.

@flutter-dashboard flutter-dashboard bot added the tool Affects the "flutter" command-line tool. See also t: labels. label Sep 18, 2020
@flutter-dashboard
Copy link

It looks like this pull request may not have tests. Please make sure to add tests before merging. If you need an exemption to this rule, contact Hixie on the #hackers channel in Chat.

Reviewers: Read the Tree Hygiene page and make sure this patch meets those guidelines before LGTMing.

@jmagman
Copy link
Member

jmagman commented Sep 19, 2020

Were you ever able to repro? I'd rather know what changed between 1.17 and 1.20 and track down the bug, rather than just timing out. @zanderso was investigating this area with #54123. There was also a lot of asset work done this release. @jonahwilliams @dnfield

@zanderso
Copy link
Member

Ah, that PR was something else.

As for this PR, 4 minutes sounds like an OS-level connection timeout. We might be able to bring that in by setting connectionTimeout on the HttpClient. Of course, that won't help if the client is already connected, and is timing out afterwards.

However, client request output is buffered (buffer size 8KB): https://api.dart.dev/dev/2.10.0-146.0.dev/dart-io/HttpClientRequest/bufferOutput.html, so we might not be trying to connect and send anything until the close() call.

If we think the transfer is taking too long, we should use abort() https://api.dart.dev/dev/2.10.0-146.0.dev/dart-io/HttpClientRequest/abort.html after the timeout to clean up the connection properly instead of leaking it for 4 minutes.

As @jmagman says however, it would be nice to find the root cause before adding the timeout, but adding the timeout is not terrible here since we've already got a retry loop.

@jonahwilliams
Copy link
Contributor

Should we be using platform specific tools in order to upload binaries/assets to each device instead of HTTP? Once we have the devFS location presumably we could use something like adb push/ios-deploy something. Is that more reliable?

@jonahwilliams
Copy link
Contributor

Wrote up go/flutter-improving-devfs-reliability to discuss some options

@jonahwilliams
Copy link
Contributor

In general I think playing with the timeouts here is a bit scary. We don't really know how long it should take to sync an asset

@gaaclarke
Copy link
Member Author

gaaclarke commented Sep 21, 2020

@jmagman @jonahwilliams @zanderso Yes, I was able to reproduce it. The bug is in closing the http request. It is a bug in the Dart http library. I've filed an issue.

We already have timeouts for this call. They are 4 minutes long which is dictated by Dart. This timeout is one that is more amenable to our users and is 50x the observed duration during normal operation. Closing the request happens after the data has already been sent, so we do know how long it should take. The call before close is the one that scales based on the size of the payload.

Regardless, the devfs sync operation retries 10 times, so even if it somehow prematurely times out, it would have to exhibit the pathological case 9 more times and prematurely timeout 9 times which is highly unlikely. (in the bug reports and my observation the stalled asset has always succeeded the second time).

@gaaclarke
Copy link
Member Author

One other thing to consider is that even if Dart fixes the http code, we will still want an apropos timeout.

@gaaclarke
Copy link
Member Author

It's also possible that the bug is in the devfs server. Do we own that code? I'm thinking it might possibly be closing a connection when it receives the payload before returning a status code, that would match my observations.

@jonahwilliams
Copy link
Contributor

Yes, the devFS is part of the dart vm service: https://github.com/dart-lang/sdk/blob/master/sdk/lib/vmservice/devfs.dart

@jonahwilliams
Copy link
Contributor

there are a few associated files there, but do note that we use the HTTP API and not the JSON-RPC API

@jonahwilliams
Copy link
Contributor

I agree that fixing this may involve changing/adding timeouts, but given that we don't really seem to know the cause I'm a bit concerned that we'd just be sort of papering-over some other issue. If you can reproduce the syncing issues fairly consistently it seems like the best course of action would be to identify the actual error condition we're hitting on the device/devFS

@jmagman
Copy link
Member

jmagman commented Sep 21, 2020

Yes, I was able to reproduce it.

I was never able to get it to repro. I can help bisecting if I can figure out how to do so.

@jmagman
Copy link
Member

jmagman commented Sep 21, 2020

Should we be using platform specific tools in order to upload binaries/assets to each device instead of HTTP?

Just to link it here, @jonahwilliams opened #66266

@gaaclarke
Copy link
Member Author

I was able to capture the error in pcap:
no-response-put-devfs.pcapng.gz

The HTTP PUT is number 1547. The server never sends us a status code back for that PUT despite having sent the data. The retry is number 2086.

Here is the logs (with my patch recovering):

[  +24 ms] Connecting to service protocol: http://localhost:63162
[        ] DDS is currently disabled due to https://github.com/flutter/flutter/issues/62507
[  +24 ms] Successfully connected to service protocol: http://localhost:63162
[   +1 ms] Waiting for iPhone (2) to report its views...
[   +4 ms] Waiting for iPhone (2) to report its views... (completed in 3ms)
[   +6 ms] DevFS: Creating new filesystem on the device (null)
[   +9 ms] DevFS: Created new filesystem on the device
(file:///private/var/mobile/Containers/Data/Application/9BFD0295-83EC-4888-821D-3EAFB5BD8D50/tmp/stable_flutterL2ODmx/stable_flutter/)
[   +2 ms] Updating assets
[  +82 ms] Manifest contained wildcard assets. Inserting missing file into build graph to force rerun. for more information see #56466.
[   +6 ms] Syncing files to device iPhone (2)...
[   +1 ms] Scanning asset files
[   +2 ms] <- reset
[        ] Compiling dart to kernel with 0 updated files
[   +1 ms] <- recompile package:stable_flutter/main.dart d855deb7-0e70-446b-9333-9243cf07f514
[        ] <- d855deb7-0e70-446b-9333-9243cf07f514
[  +39 ms] Updating files
[ +539 ms] Error writing "build/flutter_assets/assets/icons/star.png" to DevFS: TimeoutException after 0:00:00.500000: Future not completed
[        ] trying again in a few - 9 more attempts left
[ +518 ms] DevFS: Sync finished

I'll add these details to the Dart issue as well.

I'm a bit concerned that we'd just be sort of papering-over some other issue

That's usually how internet protocols work, that's why there is the "Robustness principle" that dictates that we be liberal with what we accept. That means we need to consider the fact that the server, now and in the future, may not return a status code to our PUT request.

@jonahwilliams
Copy link
Contributor

We control both the client and the server here. If there is an ambient bug in the devFS logic that is only being triggered on iOS, we need to track that down.

@gaaclarke
Copy link
Member Author

I've been able to verify that request.response.close() isn't getting called in the pathological case:

The following patch yields:

--- a/sdk/lib/_internal/vm/bin/vmservice_server.dart
+++ b/sdk/lib/_internal/vm/bin/vmservice_server.dart
@@ -307,13 +307,16 @@ class Server {
             fsPath = fsPathList[0];
           }
         }
-      } catch (e) {/* ignore */}
+      } catch (e) {
+        print('vmservice_server: error $e');
+      }
 
       String result;
       try {
         result = await _service.devfs.handlePutStream(fsName, fsPath, fsUri,
             request.cast<List<int>>().transform(gzip.decoder));
       } catch (e) {
+        print('vmservice_server: handlePutStream error $e');
         request.response.statusCode = HttpStatus.internalServerError;
         request.response.write(e);
         request.response.close();
@@ -326,6 +329,7 @@ class Server {
         request.response.write(result);
       }
       request.response.close();
+      print('vmservice_server: request close! ${fsUri ?? ''}');
       return;
     }
     if (request.method != 'GET') {
default	15:21:31.051786-0700	Runner	flutter: vmservice_server: request close! build/flutter_assets/assets/icons/star.png
default	15:21:31.052525-0700	Runner	flutter: vmservice_server: request close! build/flutter_assets/assets/images/DS_Store
default	15:21:31.052836-0700	Runner	flutter: vmservice_server: request close! build/flutter_assets/assets/icons/.DS_Store
default	15:21:31.053275-0700	Runner	flutter: vmservice_server: request close! build/flutter_assets/assets/icons/DS_Store
default	15:21:31.054822-0700	Runner	flutter: vmservice_server: request close! build/flutter_assets/assets/images/dash.png
default	15:21:31.061486-0700	Runner	flutter: vmservice_server: request close! build/flutter_assets/AssetManifest.json
default	15:21:31.063394-0700	Runner	flutter: vmservice_server: request close! build/flutter_assets/FontManifest.json
default	15:21:31.081074-0700	Runner	flutter: vmservice_server: request close! build/flutter_assets/packages/cupertino_icons/assets/CupertinoIcons.ttf
default	15:21:31.084466-0700	Runner	flutter: vmservice_server: request close! build/flutter_assets/NOTICES
default	15:21:31.117615-0700	Runner	flutter: vmservice_server: request close! build/flutter_assets/fonts/MaterialIcons-Regular.otf
default	15:21:31.153009-0700	Runner	flutter: vmservice_server: request close! build/flutter_assets/assets/videos/test_video_1.mp4
default	15:21:32.028810-0700	Runner	flutter: vmservice_server: request close! build/flutter_assets/assets/images/.DS_Store

When:

[ +535 ms] Error writing "build/flutter_assets/assets/images/.DS_Store" to DevFS: TimeoutException after 0:00:00.500000: Future not completed
[        ] trying again in a few - 9 more attempts left

@gaaclarke
Copy link
Member Author

When printing out the put messages as well as when we close the response we find that that code never got the PUT request despite what the client thinks:

server:

default	15:32:30.857713-0700	Runner	flutter: vmservice_server: PUT build/flutter_assets/assets/icons/.DS_Store
default	15:32:30.864622-0700	Runner	flutter: vmservice_server: PUT build/flutter_assets/assets/icons/DS_Store
default	15:32:30.865021-0700	Runner	flutter: vmservice_server: PUT build/flutter_assets/assets/images/.DS_Store
default	15:32:30.865427-0700	Runner	flutter: vmservice_server: PUT build/flutter_assets/assets/icons/star.png
default	15:32:30.900191-0700	Runner	flutter: vmservice_server: request close! build/flutter_assets/assets/images/.DS_Store
default	15:32:30.901053-0700	Runner	flutter: vmservice_server: request close! build/flutter_assets/assets/icons/star.png
default	15:32:30.901350-0700	Runner	flutter: vmservice_server: request close! build/flutter_assets/assets/icons/DS_Store
default	15:32:30.901582-0700	Runner	flutter: vmservice_server: request close! build/flutter_assets/assets/icons/.DS_Store
default	15:32:30.910111-0700	Runner	flutter: vmservice_server: PUT build/flutter_assets/AssetManifest.json
default	15:32:30.912617-0700	Runner	flutter: vmservice_server: request close! build/flutter_assets/AssetManifest.json
default	15:32:30.912967-0700	Runner	flutter: vmservice_server: PUT build/flutter_assets/assets/videos/test_video_1.mp4
default	15:32:30.914638-0700	Runner	flutter: vmservice_server: PUT build/flutter_assets/packages/cupertino_icons/assets/CupertinoIcons.ttf
default	15:32:30.917633-0700	Runner	flutter: vmservice_server: PUT build/flutter_assets/fonts/MaterialIcons-Regular.otf
default	15:32:30.921756-0700	Runner	flutter: vmservice_server: PUT build/flutter_assets/FontManifest.json
default	15:32:30.923954-0700	Runner	flutter: vmservice_server: request close! build/flutter_assets/FontManifest.json
default	15:32:30.928618-0700	Runner	flutter: vmservice_server: request close! build/flutter_assets/packages/cupertino_icons/assets/CupertinoIcons.ttf
default	15:32:30.934921-0700	Runner	flutter: vmservice_server: PUT build/flutter_assets/NOTICES
default	15:32:30.947721-0700	Runner	flutter: vmservice_server: request close! build/flutter_assets/NOTICES
default	15:32:30.971329-0700	Runner	flutter: vmservice_server: request close! build/flutter_assets/fonts/MaterialIcons-Regular.otf
default	15:32:31.006757-0700	Runner	flutter: vmservice_server: request close! build/flutter_assets/assets/videos/test_video_1.mp4
default	15:32:31.873486-0700	Runner	flutter: vmservice_server: PUT build/flutter_assets/assets/images/DS_Store
default	15:32:31.877027-0700	Runner	flutter: vmservice_server: PUT build/flutter_assets/assets/images/dash.png
default	15:32:31.889961-0700	Runner	flutter: vmservice_server: request close! build/flutter_assets/assets/images/DS_Store
default	15:32:31.896555-0700	Runner	flutter: vmservice_server: request close! build/flutter_assets/assets/images/dash.png

client:

[  +40 ms] Updating files
[ +534 ms] Error writing "build/flutter_assets/assets/images/DS_Store" to DevFS: TimeoutException after 0:00:00.500000: Future not completed
[        ] trying again in a few - 9 more attempts left
[        ] Error writing "build/flutter_assets/assets/images/dash.png" to DevFS: TimeoutException after 0:00:00.500000: Future not completed
[        ] trying again in a few - 9 more attempts left
[ +530 ms] DevFS: Sync finished

Notice that build/flutter_assets/assets/images/DS_Store is only showing up once on the server side.

@gaaclarke
Copy link
Member Author

I investigated using HttpClient.connectionTimeout and it didn't catch the problem case:

[   +2 ms] Updating assets
[  +81 ms] Manifest contained wildcard assets. Inserting missing file into build graph to force rerun. for more information see #56466.
[   +5 ms] Syncing files to device iPhone (2)...
[   +1 ms] Scanning asset files
[   +1 ms] <- reset
[        ] Compiling dart to kernel with 0 updated files
[   +1 ms] <- recompile package:stable_flutter/main.dart d2c1dc4a-4220-4815-8395-44f784c2bfaf
[        ] <- d2c1dc4a-4220-4815-8395-44f784c2bfaf
[  +38 ms] Updating files
[+239078 ms] Error writing "build/flutter_assets/assets/icons/.DS_Store" to DevFS: HttpException: , uri = http://localhost:50448
[        ] trying again in a few - 9 more attempts left
[ +525 ms] DevFS: Sync finished
--- a/packages/flutter_tools/lib/src/devfs.dart
+++ b/packages/flutter_tools/lib/src/devfs.dart
@@ -220,7 +220,9 @@ class _DevFSHttpWriter {
     : httpAddress = serviceProtocol.httpAddress,
       _client = httpClient,
       _osUtils = osUtils,
-      _logger = logger;
+      _logger = logger {
+        _client.connectionTimeout = Duration(milliseconds:500);
+      }
 
   final HttpClient _client;
   final OperatingSystemUtils _osUtils;
@@ -273,7 +275,7 @@ class _DevFSHttpWriter {
         // The contents has already been streamed, closing the request should
         // not take long but we are experiencing hangs with it, see #63869.
         final HttpClientResponse response =
-            await request.close().timeout(const Duration(milliseconds: 500));
+            await request.close();
         response.listen((_) {},
           onError: (dynamic error) {
             _logger.printTrace('error: $error');

@gaaclarke
Copy link
Member Author

I updated the PR so that it aborts the request in the case of a timeout.

@gaaclarke
Copy link
Member Author

I did some further investigation into the root cause of the missing status code and filed a dart bug here: dart-lang/sdk#43525

It seems like there might be something gobbled up in _HttpParser. Root cause wasn't established.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs for abort() say that this will cause HttpClientRequest.done to be completed with an error, which I would have expected to be propagated to the containing zone, skipping over the catch block below since it is an asynchronous error.

Even if that isn't what should happen, we can be on the safe side and put an await request.done after this line, which should definitely force the error into the catch block below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the change. I tried to understand what the concern is and had a hard time figuring it out. Is your concern that there was some asynchronous operations in abort that could be short circuited the TimeoutException?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without awaiting the request's done Future, the TimeoutException would be rethrown and caught below as expected, but in addition, abort() will complete the done Future with an error (probably in a later microtask). When a Future is completed with an error, and it doesn't have an error handling callback attached with e.g. Future.catchError or isn't awaited, then the error skips over synchronous try {} catch {} blocks, and is passed to the unhandled exception callback of the containing Zone. We don't want that to happen here because that callback will report the exception to crash logging.

If you aren't seeing the abort() result in an error like that, then that behavior doesn't match the API docs...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is passed to the unhandled exception callback of the containing Zone

Ah, ok. That's the piece I was missing, thanks. I was thinking the the abort exception would silently be ignored since there was no one to catch it.

Copy link
Member

@zanderso zanderso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without awaiting the request's done Future, the TimeoutException would be rethrown and caught below as expected, but in addition, abort() will complete the done Future with an error (probably in a later microtask). When a Future is completed with an error, and it doesn't have an error handling callback attached with e.g. Future.catchError or isn't awaited, then the error skips over synchronous try {} catch {} blocks, and is passed to the unhandled exception callback of the containing Zone. We don't want that to happen here because that callback will report the exception to crash logging.

If you aren't seeing the abort() result in an error like that, then that behavior doesn't match the API docs...

Comment on lines 276 to 277
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no newline after '=', two space indent on continued line.

Suggested change
final HttpClientResponse response =
await request.close().timeout(const Duration(milliseconds: 500));
final HttpClientResponse response = await request.close().timeout(
const Duration(milliseconds: 500),
);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@fluttergithubbot
Copy link
Contributor

This pull request is not suitable for automatic merging in its current state.

  • The status or check suite Google testing has failed. Please fix the issues identified (or deflake) before re-applying this label.

'test',
fileSystem.currentDirectory,
fileSystem: fileSystem,
logger: BufferLogger.test(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to take a look at the stdout and stderr from the BufferLogger to make sure their contents is as expected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 350 to 362
when(httpRequest.close()).thenAnswer((Invocation invocation) async {
if (!didAbort) {
while (!didAbort) {
await new Future.delayed(const Duration(milliseconds : 10));
}
throw HttpException('aborted');
}
didRetry = true;
return httpClientResponse;
});
when(httpRequest.abort()).thenAnswer((_) {
didAbort = true;
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use completers to avoid the polling loop. This also uses a completer for the result of close(), so that the same Future can be returned for httpRequest.done.

    final Completer<void> abortCompleter = Completer<void>();
    final Completer<HttpClientResponse> responseCompleter = Completer<HttpClientResponse>();
    when(httpRequest.close()).thenAnswer((Invocation invocation) async {
      if (!didAbort) {
        await abortCompleter.future;
        unawaited(Future<void>(() async {
          responseCompleter.completeError(HttpException('aborted'));
        });
        return responseCompleter.future;
      }
      didRetry = true;
      return httpClientResponse;
    });
    when(httpRequest.abort()).thenAnswer((_) {
      didAbort = true;
      abortCompleter.complete();
    });
    when(httpRequest.done).thenReturn(responseCompleter.future);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@gaaclarke gaaclarke requested a review from zanderso September 23, 2020 22:41
@gaaclarke
Copy link
Member Author

Google Tests seems to not have run and it looks like they failed again in CL 333390424.

Copy link
Member

@zanderso zanderso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A negative test that makes sure the tool fails the right way when all 10 retries fail would also be good if we don't already have one of those.

<Completer<MockHttpClientResponse>>[hanger, succeeder];
succeeder.complete(MockHttpClientResponse());

when(httpRequest.close()).thenAnswer((Invocation invocation) async {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to by async, and I don't think you want to await the Future before returning it---the code under test should be doing that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return succeeder.future;
} else {
// This branch shouldn't happen.
assert(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert(false) -> fail('This branch should not happen')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member Author

@gaaclarke gaaclarke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A negative test that makes sure the tool fails the right way when all 10 retries fail would also be good if we don't already have one of those.

If you don't mind. I'd rather prioritize getting this fix landed for now. It's affecting users and has ran into plenty of roadblocks. I still have to investigate what's happening in "Google Tests" and I don't want to add the opportunity for more issues.

I added an issue for that test so we don't lose it: #66564

@jonahwilliams
Copy link
Contributor

Correctness should be the primary concern here. Cherry picking something like this is especially risky, given that we don't really have integration test coverage for these sorts of failures - we'll be relying on crash reports and user issues to know if the fix worked.

@gaaclarke
Copy link
Member Author

Correctness should be the primary concern here. Cherry picking something like this is especially risky, given that we don't really have integration test coverage for these sorts of failures - we'll be relying on crash reports and user issues to know if the fix worked.

Are you talking about the test that exhausts the retry count? That isn't related to to this change. This change has 100% test code coverage. As far as integration tests, we actually do have them, they are in Google Test and I'm investigating them now.

@zanderso
Copy link
Member

zanderso commented Sep 24, 2020

Two things:

  1. I'm suggesting adding a negative unit test as part of this change because I suspect it might shed light on the failing internal test. If you feel it would be easier to attack the failing internal test directly first, then feel free to reverse the order.
  2. If you are feeling deadline pressure from a person or a process to get something landed quickly, then please include me wherever any relevant discussion is happening.

@gaaclarke
Copy link
Member Author

gaaclarke commented Sep 24, 2020

Two things:

  1. I'm suggesting adding a negative unit test as part of this change because I suspect it might shed light on the failing internal test. If you feel it would be easier to attack the failing internal test directly first, then feel free to reverse the order.
  2. If you are feeling deadline pressure from a person or a process to get something landed quickly, then please include me wherever any relevant discussion is happening.

Thanks, I don't want to be misunderstood I'm not trying to shirk responsibility, I'm being economical with my time and respectful of our user's time. I have responsibilities beyond this PR. If a task is suggested, no matter how good the suggestion is, if it doesn't advance the fixing of this issue it needs to be prioritized against all the other responsibilities I have. In this case I don't see a reasonable theory that the suggested test will help Google Tests pass or cover an issue that could arise from the code change. If I'm missing something please let me know. I think it's a good idea, that's why I made the issue.

@jmagman
Copy link
Member

jmagman commented Sep 24, 2020

Mac tool_tests failure in an infra failure in LUCI, it passed on Cirrus.

Copy link
Member

@jmagman jmagman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's merge this.

@jmagman
Copy link
Member

jmagman commented Sep 24, 2020

LGTM, let's merge this.

Hm I thought FRoB had run again, but it didn't? Do we know if those TAP tests are now passing?

@gaaclarke
Copy link
Member Author

@jmagman I need a few more hours to investigate the Google Test failures that aren't showing up in the CI list. I don't think they are critical failures but I want to make sure. The one I investigated yesterday was failing because of extra stderr output which I can't reproduce locally.

@jmagman
Copy link
Member

jmagman commented Sep 24, 2020

@gaaclarke Sorry to rush you, I had thought the Google tests were passing, but they just didn't run.

@gaaclarke
Copy link
Member Author

I was able to reproduce Google Test basic_runner_attach_android_test_pixel_3_xl_google_29_x86. I was able to verify that the test is working before this PR and afterwards all assets appear to fail to load. Those tests run on slow machines, hopefully it is just something running slow and there isn't an error in my assessment that that synchronization step isn't bounded by asset size.

[  +12 ms] <- compile package:mobile.flutter.tests.app/main.dart
[+47873 ms] Updating files
[ +784 ms] Error writing "build/flutter_assets/AssetManifest.json" to DevFS: HttpException: Request has been aborted
[        ] trying again in a few - 9 more attempts left
[  +88 ms] Error writing "build/flutter_assets/FontManifest.json" to DevFS: HttpException: Request has been aborted
[        ] trying again in a few - 9 more attempts left
[ +567 ms] Error writing "build/flutter_assets/MaterialIcons-Extended.ttf" to DevFS: HttpException: Request has been aborted
[        ] trying again in a few - 9 more attempts left
[ +843 ms] Error writing "build/flutter_assets/AssetManifest.json" to DevFS: HttpException: Request has been aborted
[        ] trying again in a few - 8 more attempts left
[ +155 ms] Error writing "build/flutter_assets/FontManifest.json" to DevFS: HttpException: Request has been aborted
[        ] trying again in a few - 8 more attempts left
[ +839 ms] Error writing "build/flutter_assets/MaterialIcons-Extended.ttf" to DevFS: HttpException: Request has been aborted
[        ] trying again in a few - 8 more attempts left
[ +598 ms] Error writing "build/flutter_assets/AssetManifest.json" to DevFS: HttpException: Request has been aborted
[        ] trying again in a few - 7 more attempts left
[        ] Error writing "build/flutter_assets/FontManifest.json" to DevFS: HttpException: Request has been aborted
[        ] trying again in a few - 7 more attempts left
[+1008 ms] Error writing "build/flutter_assets/MaterialIcons-Extended.ttf" to DevFS: HttpException: Request has been aborted
[        ] trying again in a few - 7 more attempts left
[ +343 ms] Error writing "build/flutter_assets/AssetManifest.json" to DevFS: HttpException: Request has been aborted
[        ] trying again in a few - 6 more attempts left
[        ] Error writing "build/flutter_assets/FontManifest.json" to DevFS: HttpException: Request has been aborted
[        ] trying again in a few - 6 more attempts left
[+1067 ms] Error writing "build/flutter_assets/MaterialIcons-Extended.ttf" to DevFS: HttpException: Request has been aborted
[        ] trying again in a few - 6 more attempts left
[ +190 ms] Error writing "build/flutter_assets/AssetManifest.json" to DevFS: HttpException: Request has been aborted
[        ] trying again in a few - 5 more attempts left
[        ] Error writing "build/flutter_assets/FontManifest.json" to DevFS: HttpException: Request has been aborted
[        ] trying again in a few - 5 more attempts left
[+1407 ms] Error writing "build/flutter_assets/MaterialIcons-Extended.ttf" to DevFS: HttpException: Request has been aborted
[        ] trying again in a few - 5 more attempts left
[ +154 ms] Error writing "build/flutter_assets/AssetManifest.json" to DevFS: HttpException: Request has been aborted
[        ] trying again in a few - 4 more attempts left
[        ] Error writing "build/flutter_assets/FontManifest.json" to DevFS: HttpException: Request has been aborted
[        ] trying again in a few - 4 more attempts left
[ +542 ms] Error writing "lib/main.dart.dill" to DevFS: HttpException: Request has been aborted
[        ] trying again in a few - 9 more attempts left
[ +469 ms] Error writing "build/flutter_assets/MaterialIcons-Extended.ttf" to DevFS: HttpException: Request has been aborted
[        ] trying again in a few - 4 more attempts left
[   +9 ms] Error writing "build/flutter_assets/AssetManifest.json" to DevFS: HttpException: Request has been aborted
[        ] trying again in a few - 3 more attempts left
[  +23 ms] Error writing "build/flutter_assets/FontManifest.json" to DevFS: HttpException: Request has been aborted
[        ] trying again in a few - 3 more attempts left
[+1393 ms] Error writing "build/flutter_assets/AssetManifest.json" to DevFS: HttpException: Request has been aborted
[        ] trying again in a few - 2 more attempts left
[  +10 ms] Error writing "build/flutter_assets/FontManifest.json" to DevFS: HttpException: Request has been aborted
[        ] trying again in a few - 2 more attempts left
[ +120 ms] Error writing "build/flutter_assets/MaterialIcons-Extended.ttf" to DevFS: HttpException: Request has been aborted
[        ] trying again in a few - 3 more attempts left
[+1082 ms] Error writing "build/flutter_assets/AssetManifest.json" to DevFS: HttpException: Request has been aborted
[  +11 ms] trying again in a few - 1 more attempts left
[        ] Error writing "build/flutter_assets/FontManifest.json" to DevFS: HttpException: Request has been aborted
[        ] trying again in a few - 1 more attempts left
[ +462 ms] Error writing "build/flutter_assets/MaterialIcons-Extended.ttf" to DevFS: HttpException: Request has been aborted
[   +1 ms] trying again in a few - 2 more attempts left
[ +688 ms] Error writing "build/flutter_assets/AssetManifest.json" to DevFS: HttpException: Request has been aborted
[        ] trying again in a few - 0 more attempts left
[        ] Error writing "build/flutter_assets/FontManifest.json" to DevFS: HttpException: Request has been aborted
[        ] trying again in a few - 0 more attempts left
[ +617 ms] Error writing "build/flutter_assets/MaterialIcons-Extended.ttf" to DevFS: HttpException: Request has been aborted
[        ] trying again in a few - 1 more attempts left
[ +892 ms] Error writing "build/flutter_assets/AssetManifest.json" to DevFS: HttpException: Request has been aborted
stderr: [   +1 ms] Could not update files on device: HttpException: Request has been aborted
[   +8 ms] Syncing files to device Pixel 3 XL Android... (completed in 62,374ms, longer than expected)

@gaaclarke
Copy link
Member Author

gaaclarke commented Sep 24, 2020

I printed out the times those close functions take to finish in milliseconds with the size of the payload in bytes:

request.close size:78 time: 1499
request.close size:229308 time: 1850
request.close size:61 time: 1948
request.close size:12848960 time: 5294

That's very different than the local ~10ms I was seeing.

Notice that each one is larger than the last. I think this is an artifact of the result that we are parallelizing the sending of the files, but the server is handling them serially. That means payload size does indirectly affect the time it takes to wait for the status code (not in network cost, but processing cost). It also means that retry attempts will have lower likelihood of timing out as the server load lessens (assuming some tasks converge).

These tests are also running on server farms with emulated devices so I expect their numbers to be higher than anyone would experience in any sane setup. This speaks to the need to find the root cause in the server even more. A safe number would be 10x the observed max which would be 50 seconds, which is still better than the current 4 minute timeout. It is hardly an acceptable pause to receive 20% of the time.

updated the timeout documentation
reduced the max number of inflight uploads
@gaaclarke
Copy link
Member Author

@jonahwilliams @zanderso Please give this another look. I increased the timeout to 10 seconds based on my observations in Google Tests, trying to balance safe and considerable. I also reduced the max inflight uploads to limit the variance in the time the request.close takes (since it appears to be waiting for the serial handling of the server of PUT requests).

Copy link
Member

@zanderso zanderso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm on green presubmits

@gaaclarke gaaclarke merged commit 76ad864 into flutter:master Sep 25, 2020
@mehmetf
Copy link
Contributor

mehmetf commented Oct 19, 2020

@gaaclarke

Unfortunately, this had a side effect for developers deploying large apps over proxied WFH environments. There are legitimate cases where the sync takes over 10 seconds. We should either revert this or increase the timeout. I believe 60 seconds is reasonable. I would consider any dev cycle that takes over 60 seconds as unusable. See b/171005910.

@jonahwilliams
Copy link
Contributor

It should be safe to revert. The problem this was trying to solve was syncing assets during the initial connection, after #66742 we no longer need to sync any assets

@jonahwilliams
Copy link
Contributor

Increasing to 60 seconds would be a better change though, then we're not relying on the OS level timeout

@gaaclarke
Copy link
Member Author

#68488 increased the timeout to 1 minute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tool Affects the "flutter" command-line tool. See also t: labels.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

iOS VM services takes 4 min to attach when network dialog shows

8 participants