Skip to content

Distinguish Navigation URLs and Frame URLs #13706

@adamraine

Description

@adamraine

Terminology

This doesn't have to stick, I just need it to make writing the issue easier.

  • Navigated (Navigation?, Document?) URL: The the last URL the browser performed a hard navigation to.
    • Lighthouse resolves this URL tracking CDP Page.frameNavigated events.
    • Can only be determined in Lighthouse if there was a navigation
  • Frame URL: The URL that appears in the search bar. Can be changed with history.pushState or anchor links without making an additional network request or performing a hard navigation.
    • Lighthouse legacy navigation runner does not use this URL anywhere.
    • URL can queried using Page.getFrameTree.
    • Lighthouse FR runners can resolve this URL using the Puppeteer page.url() function (wrapped by driver.url()).

Problem

For navigations, gatherers need to know the navigated URL in order to find the main document, and there can be issues if the frame URL is provided instead. #13699 will ensure consistent use of the navigation URL for navigation mode. However, timespan and snapshot mode cannot resolve the navigated URL without Page.frameNavigated events, so they must use the frame url with page.url() instead.

In the LHR requestedUrl/finalUrl, we use the nav URL which can be confusing to the end user who probably expects the frame URL #13697. Again, this does not apply timespan/snapshot which have to use the frame URL everywhere.

Once #13699 is merged the following shows when each "type" of URL will be returned from different sources:

Gather context.url artifacts.URL lhr.requestedUrl / lhr.finalUrl driver.url() / page.url()
Legacy Navigation URL Navigation URL Navigation URL N/A
Navigation Navigation URL Navigation URL Navigation URL Frame URL
Timespan Frame URL Frame URL Frame URL Frame URL
Snapshot Frame URL Frame URL Frame URL Frame URL

Solution

To ensure the "type" of URL is consistant in all three modes, I propose the following setup:

Gather context.url artifacts.URL.* lhr.requestedUrl lhr.finalUrl driver.url() / page.url()
Legacy Deprecated See below Navigation URL Frame URL N/A
Navigation Deprecated See below Navigation URL Frame URL Frame URL
Timespan Deprecated See below N/A Frame URL Frame URL
Snapshot Deprecated See below N/A Frame URL Frame URL

New URL artifact:

interface URL {
	/** URL of the main frame before Lighthouse starts */
	initialUrl: string;
	/** URL of the first document request during a Lighthouse navigation. `undefined` in timespan/snapshot modes. */
	requestedUrl?: string;
	/** URL of the last document request during a Lighthouse navigation. `undefined` in timespan/snapshot modes. */
	mainDocumentUrl?: string;
	/** URL of the main frame after Lighthouse finishes */
	finalUrl: string;
}

Some notes on the above proposal:

  • Gather context.url is deprecate because we can get the Nav URL from artifacts.URL and the frame URL from driver.url()
  • lhr.requestedUrl will be an optional property that only appears on navigation LHRs.
  • Add new initialUrl to be a staple of every artifacts.URL
    • Frame URL
    • Would be about:blank on most navigations.

Implementation Plan

  • Add initialUrl and mainDocumentUrl to the artifacts.URL.
  • Switch audit/computed artifact usages of artifacts.URL.finalUrl to artifacts.URL.mainDocumentUrl
  • [Possibly breaking?] Deprecate context.url
  • [Breaking] Make requestedUrl undefined in timespan/snapshot on artifacts.URL and the LHR
  • [Breaking] Add finalDisplayedUrl and deprecate finalUrl
  • Add mainDocumentUrl to the LHR
  • Remove initialUrl and hold until we actually need it

Related

#8984

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions