Skip to content

♻️ Refactor core discovery queue#369

Merged
wwilsman merged 13 commits intomasterfrom
ww/discovery-queue-refactor
Jun 16, 2021
Merged

♻️ Refactor core discovery queue#369
wwilsman merged 13 commits intomasterfrom
ww/discovery-queue-refactor

Conversation

@wwilsman
Copy link
Copy Markdown
Contributor

@wwilsman wwilsman commented Jun 16, 2021

What is this?

While the current queue system in @percy/core is better than the lazy promise system from @percy/agent, there are still a few performance issues. Namely, multiple browser pages can be opened at the same time for a single snapshot. So while our queue concurrency is set to a default of 5, the 2 widths per snapshot default causes 2 pages to be open per snapshot. Resulting in our true snapshot concurrency being severely limited by the number of widths.

Additionally, when using the capture() method to take DOM snapshots, another browser page is opened to take said snapshot before asset discovery opens pages for all of the widths. While page captures have their own concurrency, this can result in up to 10+ browser pages being opened and closed all at once across the process when capturing more than 5 snapshots.

Finally, while captures and snapshots were managed by their own individual queues, resulting API uploads were not queued. If the capture/snapshot queue drains faster than a few requests can resolve, this could then result in many many in-flight requests which could then clog the process and cause timeouts.

Snapshot, capture, and discovery changes

Rather than open multiple browser pages for multiple widths to discover responsive assets in parallel, we can trigger page resize events sequentially to limit the number of pages that need to be open. This actually results in faster responsive asset discovery for initial resource requests because common requests are no longer duplicated across all widths. While testing, I discovered that in some scenarios the browser will not request smaller assets when a larger one is already loaded. Because of this, widths are sorted with the initial page loading at the smallest width and then resizing to each larger width.

For capturing DOM snapshots with the capture() method, we can eliminate the extraneous browser page by decoupling the snapshot() method and discovering assets simultaneously while capturing the DOM. To achieve this, the discoverer class was broken down into functional methods that can be used by the browser page network when enabling request interception. With this change, the discovery asset cache was moved into the main Percy class and manipulating that cache is now done via request interception resource callbacks.

Now that the capture() method was isolated from the snapshot() method, the logic between them became very similar:

snapshot() {
  // open a page with request interception enabled
  // navigate to a URL and capture assets
  // serve DOM snapshot as root resource
  // schedule snapshot upload
}

capture() {
  // open a page with request interception enabled
  // navigate to a URL and capture assets
  // capture DOM snapshot as root resource
  // schedule snapshot upload
}

Since they were now so similar, these two methods could be merged into a single snapshot() method with one important piece of logic:

// -> if a DOM snapshot was provided, serve it
// -> if no DOM snapshot was provided, capture it

The capture portion of the code remained largely similar to before, but with other asset discovery code now present alongside the capture code, the method became a bit too complex. To alleviate the complexity, capturing of the DOM snapshot was moved into a dedicated page.snapshot() method. The various wait* options were also moved to this method away from the page.goto() method, since those options are only used when capturing a DOM snapshot.

With the percy.snapshot() method now accepting capture options, a snapshot config util was created to validate and merge snapshot options. Part of this validation is ensuring that capture options are not provided along with a static DOM snapshot. The snapshots capture option was also renamed to be additionalSnapshots to avoid confusion with the method name.

Queue changes

Now that captures and snapshots share a method and the discovery process has been broken down, we can combine all three separate queues into a single snapshot queue.

To manage uploads and prevent too many open requests happening at once, an upload queue was also added. The snapshot method will now asynchronously schedule uploads using this queue. This upload queue can also be optionally deferred which allows snapshots to be replaced prior to being sent to Percy. Since uploads can be deferred, it also makes sense to defer build creation along with snapshot uploads. In the future, we can expose this option to allow snapshot retries, or to avoid creating builds when the test suite fails (this would also mean Percy would start at the end of a test run rather than process snapshots as soon as possible).

To facilitate pausing, flushing, removing, and replacing queued tasks, the internal queue class was updated to manage tasks by id. Tasks can also be run according to a priority, which is utilized by the build creation task to ensure it always runs first in the queue. The build creation task also pauses the queue until the build has been created to ensure the build exists before attempting to upload snapshots. Upon build creation failure, the queue is cleared and closed and snapshots will no longer be accepted (the snapshot method will bubble a queue closed error).

Other changes

When the API responds with a 422 build failed error, the queue will also be cleared and closed after logging the error. The @percy/sdk-utils package was updated to catch closed errors and disable future snapshots.

While @percy/client kept build state and mapped one-to-one with a build, build failures are only really tied to closing core queues and preventing future snapshot processing. The client instance does not need to know about this particular state, and other build fetching methods also do not care about state and are not tied to a single build. Therefore it made sense to remove this one-to-one build requirement from client completely. However, this means internal breaking changes to client — now all client methods relating to a build will require a build ID and assertions were added so useful errors are thrown when forgetting to provide one.

With the major parts of core shifting around, other changes were also made to that package. Its src file tree was flattened, util methods were combined, and tests were reorganized to match the new code structure. With both core and client breaking changes, all internal usage amongst the CLI was also updated.

For deferred snapshots, the core stop method was updated to not immediately clear any queued snapshots. It can however, still be told to clear them when the new force argument is true. Caught errors in the CLI are passed along to the command's respective finally() method, which we can leverage to force close the percy instance upon fatal errors.

Various logs were also updated. Redundant start and stop logs, as well as build creation logs were removed; the latter due to potentially being deferred. Logs were also added to client, since we've now long removed Winston from our dependency tree.

wwilsman added 8 commits May 28, 2021 15:35
The queue has been refactored to accept ids for tasks and allow pausing & flushing. This allows us
to add a deferUploads option which, when enabled, will wait until the process is stopped before
creating a build and uploading all snapshots. While deferred, duplicate snapshots are replaced in
the queue which allows SDKs to retry snapshots without duplicate name errors from the upstream API.

The discoverer class has been converted to be functional helpers that the network class can use to
enable request interception. This allows the capture method to intercept resources while taking a
DOM snapshot. With this addition, the capture method no longer requires the snapshot method, however
the two methods then became so similar, that it made sense move the capture functionality to the
snapshot method entirely.

With the snapshot method now accepting capture options, a per-snapshot config util was created to
validate and merge all snapshot options. Part of this validation is ensuring that capture options
are not provided along with a static dom snapshot. The snapshots capture option was also renamed to
be additionalSnapshots to avoid confusion with the snapshot method.

With the major parts of this library shifting around, other changes were also made in
occordance. The src file tree was flattened, util methods were combined, and tests were
reorganized to match the new code structure.
- Caught errors in the base command are passed along to finally, mirroring oclif behavior
- Commands will force close the percy instance when there is an error
- The snapshot command was updated to use the snapshot method
- The upload command was updated to reflect client changes
- Test logs were updated to match new logs
Seems something changed out of our control to cause this value not to be serialized. Ensuring it is
serialized fixes the flakey/failing test for browsers.
@wwilsman wwilsman force-pushed the ww/discovery-queue-refactor branch from 086c671 to c0bff87 Compare June 16, 2021 00:26
@wwilsman wwilsman added 💥 breaking Breaking change ✨ enhancement New feature or request labels Jun 16, 2021
@Robdel12
Copy link
Copy Markdown
Contributor

This upload queue can also be optionally deferred which allows snapshots to be replaced prior to being sent to Percy. Since uploads can be deferred, it also makes sense to defer build creation along with snapshot uploads. In the future, we can expose this option to allow snapshot retries, or to avoid creating builds when the test suite fails (this would also mean Percy would start at the end of a test run rather than process snapshots as soon as possible).

🔥 🔥

connectMessage = connectMessage.join(CRLF);
connectMessage += CRLF.repeat(2);

logger('client:proxy').debug(`Proxying request: ${options.href}`);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼

@wwilsman wwilsman force-pushed the ww/discovery-queue-refactor branch from c0bff87 to af31704 Compare June 16, 2021 17:46
@wwilsman wwilsman marked this pull request as ready for review June 16, 2021 19:37
@wwilsman wwilsman requested a review from Robdel12 June 16, 2021 19:37
@wwilsman wwilsman enabled auto-merge (squash) June 16, 2021 19:38
@wwilsman wwilsman disabled auto-merge June 16, 2021 19:57
@wwilsman wwilsman force-pushed the ww/discovery-queue-refactor branch 2 times, most recently from d20dd1a to 09dda31 Compare June 16, 2021 20:09
@wwilsman wwilsman force-pushed the ww/discovery-queue-refactor branch from 09dda31 to dc1888e Compare June 16, 2021 20:11
@wwilsman wwilsman enabled auto-merge (squash) June 16, 2021 20:13
Copy link
Copy Markdown
Contributor

@Robdel12 Robdel12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work here! 👏🏼 🔥 A few comments but nothing blocking or anything. Excited to ship this! 🏁


// ensure cleanup is always performed
let cleanup = () => this.finally();
let cleanup = () => this.finally(new Error('SIGTERM'));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious -- what's this for?

Copy link
Copy Markdown
Contributor Author

@wwilsman wwilsman Jun 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When OCLIF catches an error, including calls to exit() (which just throws a specific error), it passes the error along to the finally() method. This way you can handle closing things up in a different way if there was an error versus if finally() was called normally, immediately after run().

To recap part of the PR: with new changes, stopping Percy will wait on snapshots to finish processing and uploading. Unless force is true, in which case it will clear and close all queues as soon as possible.

All of that to answer your question: when the process receives one of these termination events, we want to force stop Percy. So we send finally() an error to signal it to force close any processes. The SIGTERM message is pretty arbitrary as this could trigger from either of the other events here as well (but we don't really care about which event, just that we handle closing as if we got an error).

#### Options

- `name` — Snapshot name (**required**)
- `name` — Snapshot name
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏🏼

@@ -18,16 +43,9 @@ export default class PercyClient {
// versioned percy api url
apiUrl = 'https://percy.io/api/v1'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to get this one in another PR? (Sounds right 👍🏼 )

}

async resize({
deviceScaleFactor = 1,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏🏼 Not sure if this was done before, but all our browsers are scale 1 in the infra. When safari comes along I wonder if that changes 🤨

Comment on lines +370 to +372
let failed = error.response?.status === 422 && (
error.response.body.errors.find(e => (
e.source?.pointer === '/data/attributes/build'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish we had this when we were doing support 😆

.toHaveBeenCalledOnceWith(path.join('.downloads', 'v0', 'archive.zip'));
});

it('logs the file size in a readable format', async () => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice addition too 👀

@@ -1,3 +1,7 @@
// utility types
type Without<T, U> = { [P in Exclude<keyof T, keyof U>]?: never };
type XOR<T, U> = (T | U) extends object ? (Without<T, U> & U) | (Without<U, T> & T) : T | U;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hurts to look at.

@wwilsman wwilsman merged commit c63173d into master Jun 16, 2021
@wwilsman wwilsman deleted the ww/discovery-queue-refactor branch June 16, 2021 22:06
samarsault pushed a commit that referenced this pull request Mar 3, 2023
Bumps @percy/core from 1.0.0-beta.52 to 1.0.0-beta.56.

---
updated-dependencies:
- dependency-name: "@percy/core"
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

💥 breaking Breaking change ✨ enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants