Skip to content

Improve validation of children of invalid elements#1511

Closed
kienstra wants to merge 1 commit intodevelopfrom
fix/1493-sanitization
Closed

Improve validation of children of invalid elements#1511
kienstra wants to merge 1 commit intodevelopfrom
fix/1493-sanitization

Conversation

@kienstra
Copy link
Copy Markdown
Contributor

@kienstra kienstra commented Oct 14, 2018

Before

As @amedina reported in #1493, when there is an invalid element like <details> and the validation error isn't accepted, its child elements are not validated.

Copying Alberto's test markup from #1493:

<details>
        <summary>Copyright 1999-2018.</summary>
        <p>- by Awesome Author. All Rights Reserved.</p>
        <p>All content and graphics on this web site are the property of the author.</p>
</details>

There was only an error for the <details>, even though the nested <summary> is also invalid:
details-validation-error


On accepting the <details> validation error, the <summary> is also validated. But the parent_name of the <summary> should probably be details, as it's nested in <details>:

details-nested

After

invalid-elements

Approach

I'm not sure this is the right solution, especially in ensuring the parent_name is correct.

All of the existing unit tests passed, except for one where the parent_name is the original parent. Which I think is the expected result.

But adding the ->originalParent to the DOMElement (or DOMText) isn't ideal, as it's not one of its properties.

This will of course require full front-end testing in all 3 template modes, like we've done for the other releases.

Closes #1493

Before, if there is an unaccepted invalid element like <details>,
its child elements are not validated.
Only on accepting the <details> validation error
is the <summary> validated.
So this adds elements to $fragment only if
the fragment will replace the node.
Also, this ensures that the original parent is reported.
Before, an invalid node's children replaced the invalid node,
and the childrens' parent was reassigned.
So this stores the parent in a new property.
This might not be the best strategy.
$child = $node->firstChild;
while ( $child ) {
$fragment->appendChild( $child );
$this->stack[] = $child;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, all children were appended to $fragment. Even though $fragment is only used if $should_replace.

So I think this caused the nodes in $this->stack to have no values, and thus not get validated:

this-stack

As a workaround, this stores the nodes in $children, and only appends them to $fragment if $should_replace.

$expected_errors[] = array(
'node_name' => 'foo',
'parent_name' => 'body',
'parent_name' => 'divs',
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the parent_name should be divs, as that's the parent of <foo>. But feel free to offer a different opinion.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. The same change was required in f084b43 for #3243.

@kienstra
Copy link
Copy Markdown
Contributor Author

Request For Review

Hi @amedina and @westonruter,
Could you please review this PR for #1493?

$child = $node->firstChild;
$children = array();
foreach ( $node->childNodes as $child ) {
$child->originalParent = $node->nodeName; // phpcs:ignore WordPress.NamingConventions.ValidVariableName.NotSnakeCaseMemberVar
Copy link
Copy Markdown
Contributor Author

@kienstra kienstra Oct 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without keeping track of the original parent somehow, children of the stripped element will have a different parent reported.

For example, when this is validated:

<div><invalid-element><inner-invalid></inner-invalid></invalid-element></div>

This wil strip <invalid-element>:

<div><inner-invalid></inner-invalid></div>

So <inner-invalid> will have a reported parent of <div>, even though its original parent was <invalid-element>.

Still, I'm not sure that storing it in $child->originalParent is the right approach.

@kienstra kienstra changed the title [WIP] Improve validation of children of invalid elements Improve validation of children of invalid elements Oct 14, 2018
@kienstra kienstra changed the base branch from 1.0 to develop October 15, 2018 16:25
@westonruter westonruter added this to the v1.1 milestone Nov 30, 2018
@westonruter westonruter removed this from the v1.1 milestone Mar 6, 2019
@westonruter westonruter removed the request for review from amedina May 25, 2019 22:26
Copy link
Copy Markdown
Member

@westonruter westonruter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this. It turned out that the solution presented itself while working on #3137 by switching to a depth-first post-order tree traversal. I've copied the relevant tests to #3243 where this can be wrapped up.

),
'nested_invalid_elements' => array(
array(
'source' => '<div><details><summary><p>Example Summary</p></summary><p>Example expanded text</p></details></div>',
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that AMP allows the details element (via ampproject/amphtml#18440), the tests here aren't relevant now.

$expected_errors[] = array(
'node_name' => 'foo',
'parent_name' => 'body',
'parent_name' => 'divs',
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. The same change was required in f084b43 for #3243.

);

// Both <details> and <summary> are invalid, so there should be errors for both.
$content[] = '<div><details><summary><p>Example Summary</p></summary><p>Example expanded text</p></details><div>';
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted above, these elements are no longer invalid. See ampproject/amphtml#18440.

@westonruter
Copy link
Copy Markdown
Member

Closing in favor of #3243

@westonruter westonruter deleted the fix/1493-sanitization branch September 16, 2019 04:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unexpected validation error parent element when invalid elements are nested

2 participants