Skip to content

Use markup root node as base for parse#2

Closed
aduth wants to merge 2 commits into
masterfrom
update/first-child-source
Closed

Use markup root node as base for parse#2
aduth wants to merge 2 commits into
masterfrom
update/first-child-source

Conversation

@aduth

@aduth aduth commented Aug 17, 2017

Copy link
Copy Markdown
Owner

This pull request seeks to change the behavior of passing a markup string, setting as the base for sourcing the root node of the markup string passed. This improves consistency with the element signature, and provide easier parsing against the root node in environments where the :scope pseudo-class selector is not supported:

const markup = '<div></div>';
const element = document.createElement( 'div' );

// Before:
parse( markup, prop( 'nodeName' ) ); // "BODY"
parse( element, prop( 'nodeName' ) ); // "DIV"

parse( markup, prop( ':scope > div', 'nodeName' ) ); // "DIV"

// After:
parse( markup, prop( 'nodeName' ) ); // "DIV"
parse( element, prop( 'nodeName' ) ); // "DIV"

This may be considered a breaking change, since previously it was possible to pass a markup string containing multiple elements without a parent wrapper. With these changes, the same markup would only perform parsing on the first element. While we could support both single and multiple-element markup strings by testing doc.body.childElementCount (doc.body.children.length), I don't know that this is a common use case and would require supporting multiple input formats. I am content with releasing a new major version describing this as a breaking change.

cc @youknowriad

@youknowriad

Copy link
Copy Markdown

What if we have blocks returning an array of elements? Does this mean we won't be able to parse attributes from these nodes (except the first one?)

@aduth

aduth commented Aug 17, 2017

Copy link
Copy Markdown
Owner Author

What if we have blocks returning an array of elements? Does this mean we won't be able to parse attributes from these nodes (except the first one?)

Actually, you're right, in Gutenberg this could impact a block if its content is composed of multiple top-level elements, like:

<!-- wp:core/text -->
<p>Paragraph one</p>
<p>Paragraph two</p>
<!-- /wp:core/text -->

We could apply a wrapper when parsing, something like '<div>' + rawContent + '</div>'. But this returns us to the original issue, where parsing on the root node of rawContent is difficult in the case that it is a singular element.

Considering hpq alone, the changes here seem to be a good win for consistency, at least between passing markup and an element.

One other option is to convert the markup to an element before passing to hpq, basically recreating the behavior here:

hpq/src/index.js

Lines 19 to 24 in b79c562

// Coerce to element
if ( 'string' === typeof source ) {
const doc = document.implementation.createHTMLDocument( '' );
doc.body.innerHTML = source;
source = doc.body.firstChild;
}

...where we'd then have full control over how that element is created: Either use the first child if only one child, or keep it as body.

I still think this could lead to confusion though, as in the case of the multi-paragraph block and single-paragraph block, where each would have different results for attr( 'class' ) (sourced from body or the paragraph respectively).

@youknowriad

Copy link
Copy Markdown

I understand the feeling of confusion but in the same time let's look at this:

const a = parse( '<p id="something"></p><p id="something2"></p>', attr( 'id' ) );
const b = parse( '<p id="something"></p>', attr( 'id' ) );

If we think about what these should return, I see some logic in having undefined for the first and something for the second one.

@aduth

aduth commented Aug 17, 2017

Copy link
Copy Markdown
Owner Author

I see some logic in having undefined for the first

I could see this, but it's more unexpected in that attr only returns undefined here because the body wrapper we create doesn't have an ID. If you were to do something like prop( 'nodeName' ) instead, the problem is more evident (returns "BODY", not undefined).

Maybe if we had some way to prevent (or ensure undefined return) of singular element querying against the body wrapper, it wouldn't be so bad...

@youknowriad

youknowriad commented Aug 17, 2017

Copy link
Copy Markdown

Maybe if we had some way to prevent (or ensure undefined return) of singular element querying against the body wrapper, it wouldn't be so bad...

Trying to think about the implications, I guess querying (using query) will be a bit odd.

const a = parse( '<p id="something"></p><p id="something2"></p>', query( 'p' ) );
const b = parse( '<p id="something"></p>', query( 'p' ) );

I guess I'd expect both to return the paragraphs, but the second one won't return anything.

@aduth

aduth commented Aug 17, 2017

Copy link
Copy Markdown
Owner Author

The other option for consistency is to always ensure a wrapper, even in the element signature.

const markup = '<div></div>';
const element = document.createElement( 'div' );

// Before:
parse( markup, prop( 'nodeName' ) ); // "BODY"
parse( element, prop( 'nodeName' ) ); // "DIV"

parse( markup, prop( ':scope > div', 'nodeName' ) ); // "DIV"

// After:
parse( markup, prop( 'nodeName' ) ); // "BODY"
parse( element, prop( 'nodeName' ) ); // "BODY"

To your last point, this could make more sense if you consider parsing to need at least some selector scope, and to have undetermined characteristics otherwise.

I think it could be reasonably possible to force an undefined return from attempts to parse the root wrapper:

diff --git a/src/index.js b/src/index.js
index 2aed4b9..42a0c2c 100644
--- a/src/index.js
+++ b/src/index.js
@@ -20,7 +20,8 @@ export function parse( source, matchers ) {
        if ( 'string' === typeof source ) {
                const doc = document.implementation.createHTMLDocument( '' );
                doc.body.innerHTML = source;
-               source = doc.body.firstChild;
+               source = doc.body;
+               source._hpqRoot = true;
        }
 
        // Return singular value
@@ -59,6 +60,8 @@ export function prop( selector, name ) {
                let match = node;
                if ( selector ) {
                        match = node.querySelector( selector );
+               } else if ( node._hpqRoot ) {
+                       return;
                }
 
                if ( match ) {

At which point it becomes:

const markup = '<div></div>';
const element = document.createElement( 'div' );

// Before:
parse( markup, prop( 'nodeName' ) ); // "BODY"
parse( element, prop( 'nodeName' ) ); // "DIV"

parse( markup, prop( ':scope > div', 'nodeName' ) ); // "DIV"

// After:
parse( markup, prop( 'nodeName' ) ); // undefined
parse( element, prop( 'nodeName' ) ); // undefined

Obviously this doesn't solve awkwardness with the (lack of) a :scope pseudo-class selector.

@youknowriad

Copy link
Copy Markdown

I'm hesitant. It's one of these problems where no matter the option you choose, you need to understand how it works (at least a bit) to really know how to use it.

You choose :)

@aduth

aduth commented Aug 17, 2017

Copy link
Copy Markdown
Owner Author

I still think the original proposal here is most sensible and makes for easy parsing of a root node, but acknowledge it causes some difficulty for multi-element blocks in Gutenberg. I don't think we currently have any of those, but the parser supports them. We'd have to consider whether requirements should allow for those sorts of blocks, otherwise eliminate support for it from the parser.

I'll think on this a bit more.

@aduth

aduth commented Aug 17, 2017

Copy link
Copy Markdown
Owner Author

Another thing I don't particularly like about forced selectors is it results in a choice between redundancy and inconsistency with regards to the query matcher. The behavior of query is supposed to mimic parse in a sub-context, but then how is one to get class names for each paragraph with the markup <div><p class="one"></p><p class="two"></p></div>?

  • It must be either inconsistent with root matching and allow omission of the selector:
    • parse( markup, query( 'p', attr( 'class' ) );
  • Or it must be consistent with root matching, but redundant in the 'p' selector:
    • parse( markup, query( 'p', attr( 'p', 'class' ) )

@youknowriad

Copy link
Copy Markdown

@aduth Can't we mimic the :scope behaviour somehow to avoid the browser compatibility issue? Would this solve all the issues here?

@aduth

aduth commented Aug 18, 2017

Copy link
Copy Markdown
Owner Author

@aduth Can't we mimic the :scope behaviour somehow to avoid the browser compatibility issue? Would this solve all the issues here?

Not entirely, since it still leaves inconsistency with query submatching and the parse signature where developer passes an HTMLElement object. Also not something I'm very keen to polyfill 😬

@aduth

aduth commented Oct 6, 2017

Copy link
Copy Markdown
Owner Author

According to MDN, :scope is deprecated?

https://developer.mozilla.org/en-US/docs/Web/CSS/:scope

Not sure if I had missed this previously, or if the recommendation has since changed.

@aduth aduth closed this Mar 25, 2023
@aduth aduth deleted the update/first-child-source branch March 25, 2023 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants