Terminology
This doesn't have to stick, I just need it to make writing the issue easier.
- Navigated (Navigation?, Document?) URL: The the last URL the browser performed a hard navigation to.
- Lighthouse resolves this URL tracking CDP
Page.frameNavigated events.
- Can only be determined in Lighthouse if there was a navigation
- Frame URL: The URL that appears in the search bar. Can be changed with
history.pushState or anchor links without making an additional network request or performing a hard navigation.
- Lighthouse legacy navigation runner does not use this URL anywhere.
- URL can queried using
Page.getFrameTree.
- Lighthouse FR runners can resolve this URL using the Puppeteer
page.url() function (wrapped by driver.url()).
Problem
For navigations, gatherers need to know the navigated URL in order to find the main document, and there can be issues if the frame URL is provided instead. #13699 will ensure consistent use of the navigation URL for navigation mode. However, timespan and snapshot mode cannot resolve the navigated URL without Page.frameNavigated events, so they must use the frame url with page.url() instead.
In the LHR requestedUrl/finalUrl, we use the nav URL which can be confusing to the end user who probably expects the frame URL #13697. Again, this does not apply timespan/snapshot which have to use the frame URL everywhere.
Once #13699 is merged the following shows when each "type" of URL will be returned from different sources:
|
Gather context.url |
artifacts.URL |
lhr.requestedUrl / lhr.finalUrl |
driver.url() / page.url() |
| Legacy |
Navigation URL |
Navigation URL |
Navigation URL |
N/A |
| Navigation |
Navigation URL |
Navigation URL |
Navigation URL |
Frame URL |
| Timespan |
Frame URL |
Frame URL |
Frame URL |
Frame URL |
| Snapshot |
Frame URL |
Frame URL |
Frame URL |
Frame URL |
Solution
To ensure the "type" of URL is consistant in all three modes, I propose the following setup:
|
Gather context.url |
artifacts.URL.* |
lhr.requestedUrl |
lhr.finalUrl |
driver.url() / page.url() |
| Legacy |
Deprecated |
See below |
Navigation URL |
Frame URL |
N/A |
| Navigation |
Deprecated |
See below |
Navigation URL |
Frame URL |
Frame URL |
| Timespan |
Deprecated |
See below |
N/A |
Frame URL |
Frame URL |
| Snapshot |
Deprecated |
See below |
N/A |
Frame URL |
Frame URL |
New URL artifact:
interface URL {
/** URL of the main frame before Lighthouse starts */
initialUrl: string;
/** URL of the first document request during a Lighthouse navigation. `undefined` in timespan/snapshot modes. */
requestedUrl?: string;
/** URL of the last document request during a Lighthouse navigation. `undefined` in timespan/snapshot modes. */
mainDocumentUrl?: string;
/** URL of the main frame after Lighthouse finishes */
finalUrl: string;
}
Some notes on the above proposal:
- Gather
context.url is deprecate because we can get the Nav URL from artifacts.URL and the frame URL from driver.url()
lhr.requestedUrl will be an optional property that only appears on navigation LHRs.
- Add new
initialUrl to be a staple of every artifacts.URL
- Frame URL
- Would be
about:blank on most navigations.
Implementation Plan
Related
#8984
Terminology
This doesn't have to stick, I just need it to make writing the issue easier.
Page.frameNavigatedevents.history.pushStateor anchor links without making an additional network request or performing a hard navigation.Page.getFrameTree.page.url()function (wrapped bydriver.url()).Problem
For navigations, gatherers need to know the navigated URL in order to find the main document, and there can be issues if the frame URL is provided instead. #13699 will ensure consistent use of the navigation URL for navigation mode. However, timespan and snapshot mode cannot resolve the navigated URL without
Page.frameNavigatedevents, so they must use the frame url withpage.url()instead.In the LHR
requestedUrl/finalUrl, we use the nav URL which can be confusing to the end user who probably expects the frame URL #13697. Again, this does not apply timespan/snapshot which have to use the frame URL everywhere.Once #13699 is merged the following shows when each "type" of URL will be returned from different sources:
context.urlartifacts.URLlhr.requestedUrl/lhr.finalUrldriver.url()/page.url()Solution
To ensure the "type" of URL is consistant in all three modes, I propose the following setup:
context.urlartifacts.URL.*lhr.requestedUrllhr.finalUrldriver.url()/page.url()New
URLartifact:Some notes on the above proposal:
context.urlis deprecate because we can get the Nav URL fromartifacts.URLand the frame URL fromdriver.url()lhr.requestedUrlwill be an optional property that only appears on navigation LHRs.initialUrlto be a staple of everyartifacts.URLabout:blankon most navigations.Implementation Plan
initialUrlandmainDocumentUrlto theartifacts.URL.artifacts.URL.finalUrltoartifacts.URL.mainDocumentUrlcontext.urlrequestedUrlundefined in timespan/snapshot onartifacts.URLand the LHRfinalDisplayedUrland deprecatefinalUrlmainDocumentUrlto the LHRinitialUrland hold until we actually need itRelated
#8984