Skip to content

🐛 Fix situations where asset discovery might hang#490

Merged
wwilsman merged 10 commits intomasterfrom
ww/networking-issues
Aug 11, 2021
Merged

🐛 Fix situations where asset discovery might hang#490
wwilsman merged 10 commits intomasterfrom
ww/networking-issues

Conversation

@wwilsman
Copy link
Copy Markdown
Contributor

What is this?

This PR addresses a couple of networking issues in asset discovery.

Improved timeout logs

Foremost, while debugging a very minimal reproduction of asset discovery hanging, it was very helpful to know which requests were causing the issue. This resulted in an improved network timeout log, which includes a list of the offending request URLs in debug logs.

With the improved logs, it was easy to see that there was an issue with requests for some frame documents. With some additional, very noisy, event logs (not included), it revealed that once the document's response was received, the subsequent request finished events were never observed.

Site isolation handling

This prompted an investigation into how events are handled with embedded documents, and it was then discovered that Chromium's site-isolation was somehow responsible for the lack of events once the frame was created. Disabling this feature indeed fixed the issue with frame requests hanging, however it felt wrong disabling a security feature like this.

Unrelated, I also found that the TranslateUI feature flag changed to just Translate. Unfortunately, Chromium will change flags/features between versions like this, and in this particular instance we don't happen to have an explicit test for the translate popup that this was initially meant to suppress. I still didn't add a test in this PR since it mostly focuses on network issues and I just happened to mess with this flag during the above investigation.

To avoid disabling site-isolation, we need to be able to observe subframe requests. This is possible with a couple of very small changes. While pages are already automatically created for newly attached targets, those pages were not automatically initialized. If we want to automatically attach and initialize a subframe for request interception, we do not want to resize the subframe when initialized, so page resizing was moved out of page initialization. To automatically attach and initialize subframes: the Target.setAutoAttach command is sent and when a page is created with a parent, it is initialized with inherited options (including network interception options).

Hanging frame requests

This still leaves the issue of the initial frame request not properly getting cleaned up due to never receiving request finished events. The subframe network also does not see this request since the request is already complete at the time of target creation. To clean up such frame requests, they are registered during the response phase if the resulting response has a different frame. The Page.frameDetached event is also watched, and if a frame request has not been handled at the time the event is received, it is assumed to be finished and the appropriate handlers are called.

Hanging redirects

This concludes the bulk of the changes this PR focuses on, however while testing I also noticed a flakey test related to redirects. This flake also resulted in asset discovery hanging, so it felt appropriate to include a fix with other changes related to the same undesired outcome.

While debugging this issue, I found that page navigation was sometimes flaking, and that was fixed by a slight refactor to how lifecycle events are handled during page navigation. Once that was fixed, the real issue revealed itself to be related to recent network additions (#469). With some of the logic that was added, there was a gap in which an intercepted redirect might not be properly registered as being intercepted. This is fixed by simply removing the gap in the if-else conditionals. The redirect test was made more robust as a result, although it always seems related to race-y events rather than complex redirect rules.

🐛 And always pass the minHeight when resizing
This allows request interception bindings to propagate down to subframes
If a request for a document results in a new session, such as for isolated pages, the frame is
detached before the request's loading finished event gets a chance to fire. Requests for new frames
can be tracked and when the frame becomes detached the associated request can be cleaned up.
The existing lifecycle handling combined with the polling nature of navigation resolution could
sometimes result in a situation where a page's lifecycle is reset before navigation can be deemed
complete. We can use the same technique watching for the proper lifecycle as we do for frame
navigation. However, rather than repeating code within the handlers, they were refactored slightly
to be mapped handlers. The navigation trigger was also refactored slightly into a callback that can
be called the moment we start waiting for the navigation handlers to finish.
Do to certain race conditions, intecepted redirects were not properly getting associated with an
intercept id. But since the previous request in the chain had been cleaned up, this could sometimes
cause asset discovery to hang. Adjusting the if-else logic of this handling fixes this issue by
ensuring that if a request is not being handled, it is always set up to be intercepted.
@wwilsman wwilsman added the 🐛 bug Something isn't working label Aug 11, 2021
@wwilsman wwilsman requested a review from Robdel12 August 11, 2021 21:07
Comment on lines +626 to +633
expect(logger.stderr).toContain(jasmine.stringMatching([
'^\\[percy:core] Error: Timed out waiting for network requests to idle.',
'',
' Active requests:',
' -> http://localhost:8000/img.gif',
'',
'(?<stack>(.|\n)*)$'
].join('\n')));
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💜

it('does not hang waiting for embedded isolated pages', async () => {
server.reply('/', () => [200, {
'Content-Type': 'text/html',
'Origin-Agent-Cluster': '?1' // force page isolation
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the smoking gun to reproduce this on localhost

@wwilsman wwilsman enabled auto-merge (squash) August 11, 2021 21:18
Copy link
Copy Markdown
Contributor

@Robdel12 Robdel12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🏁 Amazing work, as usual!

we don't happen to have an explicit test for the translate popup that this was initially meant to suppress

Should we create an issue to follow up? This one is likely on me -- I should have really checked the changes between Chrome 87 & 91.

@wwilsman wwilsman merged commit d88a3d0 into master Aug 11, 2021
@wwilsman wwilsman deleted the ww/networking-issues branch August 11, 2021 21:50
@Robdel12
Copy link
Copy Markdown
Contributor

Related issues: #477 #478

samarsault pushed a commit that referenced this pull request Mar 3, 2023
Bumps [@percy/core](https://github.com/percy/cli/tree/HEAD/packages/core) from 1.0.5 to 1.0.8.
- [Release notes](https://github.com/percy/cli/releases)
- [Commits](https://github.com/percy/cli/commits/v1.0.8/packages/core)

---
updated-dependencies:
- dependency-name: "@percy/core"
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🐛 bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants