Skip to content

script_thread: HTML parser doesn't set relevant option#36622

Merged
jdm merged 1 commit intoservo:mainfrom
elomscansio:html_parser_option
Apr 27, 2025
Merged

script_thread: HTML parser doesn't set relevant option#36622
jdm merged 1 commit intoservo:mainfrom
elomscansio:html_parser_option

Conversation

@elomscansio
Copy link
Contributor

@elomscansio elomscansio commented Apr 20, 2025

This patch ensures that the Servo HTML parser uses the appropriate TreeBuilderOpts settings
as specified by the HTML specification.

Changes:

  • iframe_srcdoc: Detect if the parsed document's URL scheme is about:srcdoc, and set the parser’s iframe_srcdoc option accordingly.
  • quirks_mode: Use the associated Document's quirks mode to set the parser’s quirks mode flag, improving fragment parsing behavior.
  • scripting_enabled: Add a scripting_enabled method to Document, based on whether it has a browsing context, and set this flag for the parser.

These updates align Servo's parsing behavior more closely with the specification:
https://html.spec.whatwg.org/multipage/parsing.html#the-initial-insertion-mode


  • There are tests for these changes

@elomscansio elomscansio requested a review from gterzian as a code owner April 20, 2025 17:58
@elomscansio
Copy link
Contributor Author

Hello @jdm I have been trying to figure out #35478 but i don't really know what exactly you mean #35478 (comment) and how do go about it.

see

scripting_enabled: bool,

pub(crate) fn set_quirks_mode(&self, new_mode: QuirksMode) {

document.set_quirks_mode(context_document.quirks_mode());

document.set_referrer_policy(referrer_policy);
document.scripting_enabled.set(true);
if final_url.as_str() == "about:srcdoc" {
document.iframe_src.set(origin);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't know what to do here but there is Setsrcdoc

make_setter!(SetSrcdoc, "srcdoc");

@elomscansio elomscansio marked this pull request as draft April 20, 2025 18:14
@jdm
Copy link
Member

jdm commented Apr 21, 2025

We need to set the options of the HTML parser in

let options = TreeBuilderOpts {
, which means making sure we have the information we need in
pub(crate) fn parse_html_document(
. We shouldn't need to set any properties of the document.

@elomscansio elomscansio force-pushed the html_parser_option branch 2 times, most recently from 177f0ae to 7371cac Compare April 21, 2025 18:25
@elomscansio
Copy link
Contributor Author

elomscansio commented Apr 21, 2025

the quirk mode expected by TreeBuilderOpts is not same as document's quirk mode

https://github.com/servo/html5ever/blob/0eace84090599afb7addd0e0266ba8e1775e4bc9/markup5ever/interface/tree_builder.rs#L37 vs

use style::context::QuirksMode;

@jdm
Copy link
Member

jdm commented Apr 21, 2025

the quirk mode expected by TreeBuilderOpts is not same as document's quirk mode

https://github.com/servo/html5ever/blob/0eace84090599afb7addd0e0266ba8e1775e4bc9/markup5ever/interface/tree_builder.rs#L37 vs

use style::context::QuirksMode;

Yes, you will need to match on the document's value and provide the equivalent one from html5ever.

@jdm jdm added the T-linux-wpt Do a try run of the WPT label Apr 22, 2025
@github-actions github-actions bot removed the T-linux-wpt Do a try run of the WPT label Apr 22, 2025
@github-actions
Copy link

🔨 Triggering try run (#14584548533) for Linux (WPT)

@github-actions
Copy link

Test results for linux-wpt from try job (#14584548533):

Flaky unexpected result (17)
  • OK /FileAPI/url/url-with-fetch.any.html (#21517)
    • FAIL [expected PASS] subtest: Revoke blob URL after calling fetch, fetch should succeed

      promise_test: Unhandled rejection with value: object "TypeError: Network error occurred"
      

  • OK /FileAPI/url/url-with-fetch.any.worker.html (#21517)
    • PASS [expected FAIL] subtest: Revoke blob URL after calling fetch, fetch should succeed
  • OK [expected TIMEOUT] /_webgl/conformance/textures/image_bitmap_from_video/tex-2d-rgba-rgba-unsigned_short_4_4_4_4.html
    • PASS [expected NOTRUN] subtest: Overall test
    • FAIL [expected PASS] subtest: WebGL test #1

      assert_true: createImageBitmap(source) failed: "argument could not be converted to any of: HTMLImageElement, HTMLCanvasElement, OffscreenCanvas, CSSStyleValue, Blob, ImageData" expected true got false
      

    • FAIL [expected PASS] subtest: WebGL test #3

      assert_true: createImageBitmap(source) failed: "argument could not be converted to any of: HTMLImageElement, HTMLCanvasElement, OffscreenCanvas, CSSStyleValue, Blob, ImageData" expected true got false
      

  • TIMEOUT [expected OK] /_webgl/conformance/uniforms/out-of-bounds-uniform-array-access.html (#26225)
    • NOTRUN [expected PASS] subtest: Overall test
  • OK /fetch/content-length/api-and-duplicate-headers.any.html (#35873)
    • FAIL [expected PASS] subtest: fetch() and duplicate Content-Length/Content-Type headers

      promise_test: Unhandled rejection with value: object "TypeError: Network error occurred"
      

  • OK /fetch/metadata/generated/css-font-face.sub.tentative.html (#34624)
    • PASS [expected FAIL] subtest: sec-fetch-storage-access - Not sent to non-trustworthy same-origin destination
    • PASS [expected FAIL] subtest: sec-fetch-storage-access - Not sent to non-trustworthy same-site destination
  • OK /html/browsers/browsing-the-web/navigating-across-documents/008.html (#24456)
    • FAIL [expected PASS] subtest: Link with onclick form submit to javascript url and href navigation

      assert_equals: expected "href" but got "click"
      

  • OK /html/browsers/browsing-the-web/navigating-across-documents/javascript-url-return-value-handling-dynamic.html (#28066)
    • FAIL [expected PASS] subtest: 0041 set in href="" targeting a frame and clicked

      assert_equals: expected "A" but got ""
      

    • FAIL [expected PASS] subtest: 0080 00FF set in href="" targeting a frame and clicked

      assert_equals: expected "�ÿ" but got ""
      

    • FAIL [expected PASS] subtest: 0080 00FF 0100 set in href="" targeting a frame and clicked

      assert_equals: expected "�ÿĀ" but got ""
      

    • FAIL [expected PASS] subtest: D83D DE0D set in href="" targeting a frame and clicked

      assert_equals: expected "😍" but got ""
      

    • FAIL [expected PASS] subtest: DE0D 0041 set in href="" targeting a frame and clicked

      assert_equals: expected "\ufffdA" but got ""
      

  • OK /html/browsers/browsing-the-web/navigating-across-documents/navigation-unload-same-origin-fragment.html (#20768)
    • FAIL [expected PASS] subtest: Tests that a fragment navigation in the unload handler will not block the initial navigation

      assert_equals: expected "" but got "#fragment"
      

  • OK /html/browsers/browsing-the-web/navigating-across-documents/refresh/same-document-refresh.html (#34597)
    • PASS [expected FAIL] subtest: Same-Document Referrer from Refresh
  • OK /html/semantics/embedded-content/media-elements/loading-the-media-resource/load-events-networkState.html
    • FAIL [expected PASS] subtest: NETWORK_NO_SOURCE

      assert_equals: expected 3 but got 1
      

  • OK /html/semantics/embedded-content/the-iframe-element/iframe-loading-lazy-reload-location-reload.html (#32595)
    • FAIL [expected PASS] subtest: Reloading iframe loading='lazy' before it is loaded: location.reload

      uncaught exception: Error: assert_equals: expected "http://web-platform.test:8000/html/semantics/embedded-content/the-iframe-element/support/blank.htm?src" but got "about:blank"
      

  • TIMEOUT [expected CRASH] /html/semantics/embedded-content/the-iframe-element/iframe_sandbox_popups_nonescaping-1.html (#24066)
  • OK [expected TIMEOUT] /html/semantics/forms/form-submission-0/reparent-form-during-planned-navigation-task.html (#29724)
    • PASS [expected TIMEOUT] subtest: reparent-form-during-planned-navigation-task
  • OK /html/semantics/scripting-1/the-script-element/execution-timing/077.html (#22139)
    • FAIL [expected PASS] subtest: adding several types of scripts through the DOM and removing some of them confuses scheduler

      assert_array_equals: expected property 1 to be "Script #1 ran" but got "Script #3 ran" (expected array ["Script #2 ran", "Script #1 ran", "Script #3 ran", "Script #4 ran"] got ["Script #2 ran", "Script #3 ran", "Script #4 ran", "Script #1 ran"])
      

  • TIMEOUT [expected OK] /resource-timing/nested-context-navigations-iframe.html (#24311)
    • TIMEOUT [expected PASS] subtest: Test that iframe navigations are not observable by the parent, even after history navigations by the parent

      Test timed out
      

    • NOTRUN [expected PASS] subtest: Test that crossorigin iframe navigations are not observable by the parent, even after history navigations by the parent
    • NOTRUN [expected PASS] subtest: Test that cross-site iframe navigations are not observable by the parent, even after history navigations by the parent
    • NOTRUN [expected PASS] subtest: Test that iframe navigations are not observable by the parent
    • NOTRUN [expected PASS] subtest: Test that crossorigin iframe navigations are not observable by the parent
    • NOTRUN [expected PASS] subtest: Test that cross-site iframe navigations are not observable by the parent
    • NOTRUN [expected PASS] subtest: Test that iframe refreshes are not observable by the parent
    • NOTRUN [expected PASS] subtest: Test that crossorigin iframe refreshes are not observable by the parent
    • NOTRUN [expected PASS] subtest: Test that cross-site iframe refreshes are not observable by the parent
  • ERROR /service-workers/idlharness.https.any.html (#36250)
    • PASS [expected TIMEOUT] subtest: ServiceWorkerContainer interface: operation register((TrustedScriptURL or USVString), optional RegistrationOptions)
    • PASS [expected TIMEOUT] subtest: NavigationPreloadManager interface: operation enable()
    • PASS [expected TIMEOUT] subtest: NavigationPreloadManager interface: operation disable()
    • PASS [expected TIMEOUT] subtest: NavigationPreloadManager interface: operation setHeaderValue(ByteString)
    • PASS [expected TIMEOUT] subtest: NavigationPreloadManager interface: operation getState()
Stable unexpected results that are known to be intermittent (20)
  • TIMEOUT /FileAPI/url/url-in-tags-revoke.window.html (#19978)
    • PASS [expected TIMEOUT] subtest: Fetching a blob URL immediately before revoking it works in <script> tags.
  • FAIL [expected PASS] /_mozilla/css/dirty_viewport.html (#13731)
  • OK /_webgl/conformance/textures/misc/texture-upload-size.html (#21770)
    • FAIL [expected PASS] subtest: WebGL test #44

      assert_true: could not create image (SVG) expected true got false
      

    • PASS [expected FAIL] subtest: WebGL test #52
  • TIMEOUT /content-security-policy/generic/only-valid-whitespaces-are-allowed.html (#36505)
    • PASS [expected FAIL] subtest: Should load image without any CSP - meta tag
    • PASS [expected FAIL] subtest: Should load image without any CSP - HTTP header
    • FAIL [expected PASS] subtest: Should not load image with 'none' CSP - meta tag

      assert_equals: expected "img not loaded" but got "img loaded"
      

    • FAIL [expected PASS] subtest: Should not load image with 'none' CSP - HTTP header

      assert_equals: expected "img not loaded" but got "img loaded"
      

    • FAIL [expected PASS] subtest: U+0009 TAB should be properly parsed between directive name and value - meta tag

      assert_equals: expected "img not loaded" but got "img loaded"
      

    • FAIL [expected PASS] subtest: U+0009 TAB should be properly parsed between directive name and value - HTTP header

      assert_equals: expected "img not loaded" but got "img loaded"
      

    • FAIL [expected PASS] subtest: U+000C FF should be properly parsed between directive name and value - meta tag

      assert_equals: expected "img not loaded" but got "img loaded"
      

    • FAIL [expected PASS] subtest: U+000A LF should be properly parsed between directive name and value - meta tag

      assert_equals: expected "img not loaded" but got "img loaded"
      

    • FAIL [expected PASS] subtest: U+000D CR should be properly parsed between directive name and value - meta tag

      assert_equals: expected "img not loaded" but got "img loaded"
      

    • FAIL [expected PASS] subtest: U+0020 SPACE should be properly parsed between directive name and value - meta tag

      assert_equals: expected "img not loaded" but got "img loaded"
      

    • And 12 more unexpected results...
  • FAIL [expected PASS] /css/compositing/mix-blend-mode/mix-blend-mode-video-sibling.html (#32849)
  • FAIL [expected PASS] /css/css-tables/table-cell-overflow-auto-scrolled.html (#35011)
  • OK /fetch/metadata/generated/css-font-face.https.sub.tentative.html (#32732)
    • PASS [expected FAIL] subtest: sec-fetch-dest
    • FAIL [expected PASS] subtest: sec-fetch-user

      promise_test: Unhandled rejection with value: object "Error: Failed to query for recorded headers."
      

  • OK /html/browsers/browsing-the-web/navigating-across-documents/empty-iframe-load-event.html (#29066)
    • PASS [expected FAIL] subtest: Check execution order on load handler
    • PASS [expected FAIL] subtest: Check execution order from nested timeout
  • OK /html/browsers/browsing-the-web/navigating-across-documents/initial-empty-document/iframe-src-aboutblank-navigate-immediately.html (#29048)
    • PASS [expected FAIL] subtest: Navigating to a different document with form submission
  • OK /html/browsers/browsing-the-web/navigating-across-documents/navigation-unload-same-origin.window.html (#29049)
    • FAIL [expected PASS] subtest: Same-origin navigation started from unload handler must be ignored

      assert_equals: expected "?pass" but got "?fail"
      

  • OK /html/browsers/history/the-history-interface/traverse_the_history_3.html (#21383)
    • FAIL [expected PASS] subtest: Multiple history traversals, last would be aborted

      assert_array_equals: Pages opened during history navigation expected property 1 to be 3 but got 1 (expected array [6, 3] got [6, 1])
      

  • TIMEOUT /html/interaction/focus/the-autofocus-attribute/supported-elements.html (#24145)
    • TIMEOUT [expected FAIL] subtest: Element with tabindex should support autofocus

      Test timed out
      

    • NOTRUN [expected TIMEOUT] subtest: Non-HTMLElement should not support autofocus
  • OK [expected TIMEOUT] /html/semantics/embedded-content/the-iframe-element/iframe_sandbox_popups_escaping-1.html (#22647)
    • FAIL [expected TIMEOUT] subtest: Check that popups from a sandboxed iframe escape the sandbox if allow-popups-to-escape-sandbox is used

      assert_equals: It came from a sandboxed iframe expected "null" but got "http://web-platform.test:8000"
      

  • OK [expected TIMEOUT] /html/semantics/embedded-content/the-iframe-element/iframe_sandbox_popups_nonescaping-2.html (#22154)
    • FAIL [expected NOTRUN] subtest: Check that popups from a sandboxed iframe do not escape the sandbox

      assert_equals: It came from a sandboxed iframe expected "null" but got "http://web-platform.test:8000"
      

  • TIMEOUT /html/semantics/embedded-content/the-iframe-element/iframe_sandbox_popups_nonescaping-3.html (#24066)
    • NOTRUN [expected FAIL] subtest: Check that popups from a sandboxed iframe do not escape the sandbox
  • OK /html/semantics/forms/form-submission-0/text-plain.window.html (#28687)
    • PASS [expected FAIL] subtest: text/plain: Basic File test (formdata event)
    • PASS [expected FAIL] subtest: text/plain: \r\n in name (formdata event)
    • PASS [expected FAIL] subtest: text/plain: single quote in filename (normal form)
  • OK /navigation-timing/test-navigation-type-reload.html (#33334)
    • PASS [expected FAIL] subtest: Reload domComplete > Original domComplete
    • PASS [expected FAIL] subtest: Reload domContentLoadedEventEnd > Original domContentLoadedEventEnd
    • PASS [expected FAIL] subtest: Reload loadEventEnd > Original loadEventEnd
    • PASS [expected FAIL] subtest: Reload loadEventStart > Original loadEventStart
  • OK /resize-observer/eventloop.html (#33599)
    • PASS [expected FAIL] subtest: test0: multiple notifications inside same event loop
  • OK /webaudio/the-audio-api/the-audiobuffersourcenode-interface/sub-sample-buffer-stitching.html (#22849)
    • FAIL [expected PASS] subtest: X Stitched sine-wave buffers at sample rate 43800 does not equal [0,0.06264832615852356,0.12505052983760834,0.18696144223213196,0.24813786149024963,0.308339387178421,0.36732959747314453,0.4248766601085663,0.480754554271698,0.5347436666488647,0.5866320133209229,0.6362156271934509,0.6832997798919678,0.7276994585990906,0.7692402601242065,0.8077589869499207...] with an element-wise tolerance of {"absoluteThreshold":0.0038986,"relativeThreshold":0}. Index Actual Expected AbsError RelError Test threshold [14680] 3.8482816000000000e+7 2.0512369275093079e-1 3.8482815794876307e+7 1.8760785396742857e+8 3.8985999999999999e-3 [14681] 1.1480505578219891e-2 1.4340442419052124e-1 1.3192391861230135e-1 9.1994315626575540e-1 3.8985999999999999e-3 Max AbsError of 3.8482815794876307e+7 at index of 14680. Max RelError of 1.8760785396742857e+8 at index of 14680.

      assert_true: expected true got false
      

    • FAIL [expected PASS] subtest: X SNR (-108.27125089470427 dB) is not greater than or equal to 65.737. Got -108.27125089470427.

      assert_true: expected true got false
      

  • ERROR [expected OK] /webxr/render_state_update.https.html (#27535)

@github-actions
Copy link

✨ Try run (#14584548533) succeeded.

@elomscansio
Copy link
Contributor Author

hello @jdm should i mark this PR as ready for review, OR what's next should I do

@jdm
Copy link
Member

jdm commented Apr 24, 2025

Yes, this no longer needs to be a draft!

@jdm
Copy link
Member

jdm commented Apr 24, 2025

Testing the impact of these changes is complicated—the only impact I can see from the specification is whether the parsed document will use quirks mode or not in various situations, and so any test has to set up the right preconditions, start a new parser, then verify the quirks mode status by checking if some particular quirk is applied. I don't think that effort is worthwhile, so we can go ahead and merge this once the ./mach clippy warning is fixed.

@elomscansio elomscansio marked this pull request as ready for review April 27, 2025 18:19
@jdm jdm enabled auto-merge April 27, 2025 18:21
@jdm jdm added this pull request to the merge queue Apr 27, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 27, 2025
Signed-off-by: Emmanuel Elom <elomemmanuel007@gmail.com>
@jdm jdm added this pull request to the merge queue Apr 27, 2025
github-merge-queue bot pushed a commit that referenced this pull request Apr 27, 2025
This patch ensures that the Servo HTML parser uses the appropriate
`TreeBuilderOpts` settings
as specified by the HTML specification.

Changes:
- **iframe_srcdoc:** Detect if the parsed document's URL scheme is
`about:srcdoc`, and set the parser’s iframe_srcdoc option accordingly.
- **quirks_mode:** Use the associated Document's quirks mode to set the
parser’s quirks mode flag, improving fragment parsing behavior.
- **scripting_enabled:** Add a `scripting_enabled` method to Document,
based on whether it has a browsing context, and set this flag for the
parser.

These updates align Servo's parsing behavior more closely with the
specification:

https://html.spec.whatwg.org/multipage/parsing.html#the-initial-insertion-mode

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by
`[X]` when the step is complete, and replace `___` with appropriate
data: -->
- [X] `./mach build -d` does not report any errors
- [X] `./mach test-tidy` does not report any errors
- [X] These changes fix #35478

<!-- Either: -->
- [ ] There are tests for these changes

Signed-off-by: Emmanuel Elom <elomemmanuel007@gmail.com>
Merged via the queue into servo:main with commit 085ad9e Apr 27, 2025
21 checks passed
@elomscansio elomscansio deleted the html_parser_option branch April 29, 2025 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HTML parser doesn't set relevant options

3 participants