Skip to content

Parse the attributes in a case-insensitive way to comply with the HTML parsing spec.#46748

Merged
ockham merged 7 commits intotrunkfrom
html-tag-processor-attributes-names-should-be-case-insensitive
Jan 2, 2023
Merged

Parse the attributes in a case-insensitive way to comply with the HTML parsing spec.#46748
ockham merged 7 commits intotrunkfrom
html-tag-processor-attributes-names-should-be-case-insensitive

Conversation

@adamziel
Copy link
Copy Markdown
Contributor

Accordingly to
https://html.spec.whatwg.org/multipage/parsing.html#attribute-name-state, the uppercase letters in the attribute names should be consumed as lowercase letters, e.g. <a HREF="#"> should be parsed as a with an attribute href (not HREF).

In particular, the following test case should pass:

$p = new WP_HTML_Tag_Processor( '<div DATA-enabled="true">Test</div>' );
$p->next_tag();
$p->get_attribute( 'data-enabled' );
$this->assertEquals( 'true', $p->get_attribute( 'DATA-enabled' ) );
$this->assertEquals( 'true', $p->get_attribute( 'data-enabled' ) );
$this->assertEquals( 'true', $p->get_attribute( 'DATA-ENABLED' ) );

cc @dmsnell @ockham

@adamziel adamziel added [Type] Bug An existing feature does not function as intended Developer Experience Ideas about improving block and theme developer experience labels Dec 22, 2022
@adamziel adamziel self-assigned this Dec 22, 2022
Copy link
Copy Markdown
Member

@dmsnell dmsnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, and it was on my list to verify this works based on what @ockham is doing with it. Thanks for the update.

I've tested this and it looks right and doesn't have any dramatic performance impact (though plausibly it could in some worse-case scenarios). Either way, this is a spec problem and not a convenience, so it wouldn't matter.

Thanks @adamziel

@dmsnell dmsnell force-pushed the html-tag-processor-attributes-names-should-be-case-insensitive branch from b89bed7 to 4507556 Compare December 22, 2022 17:38
@adamziel
Copy link
Copy Markdown
Contributor Author

I forgot about set_attribute and remove_attribute – just accounted for those two.

With set_attribute there's a question of what should happen here:

$p = new WP_HTML_Tag_Processor( '<div DATA-enabled="true">Test</div>' );
$p->next_tag();
$p->set_attribute( 'data-ENABLED', 'abc' );

I went for <div data-enabled="abc">Test</div>' – what do you think @dmsnell ?

@dmsnell
Copy link
Copy Markdown
Member

dmsnell commented Dec 22, 2022

@adamziel in keeping in line with touching as little as possible I propose it should work like so:

  • get_attribute() is case insensitive
  • remove_attribute() is case insensitive
  • set_attribute() respects what it was given

So for set_attribute() we can store the given name with its unique casing, but use the comparable case-insensitive check to see if we are replacing or adding a new attribute.

Updating the value of an attribute that already exists could arguable leave the old name which would touch the HTML less than replacing the name would, but then that takes away the ability to enforce a new naming casing on write.

HTML specifies how we interpret those attributes but it doesn't mandate that attributes are written in lowercase.

…L parsing spec.

Accordingly to
https://html.spec.whatwg.org/multipage/parsing.html#attribute-name-state,
the uppercase letters in the attribute names should be consumed as
lowercase letters, e.g. `<a HREF="#">` should be parsed as `a` with an
attribute `href` (not `HREF`).

In particular, the following test case should pass:

```php
$p = new WP_HTML_Tag_Processor( '<div DATA-enabled="true">Test</div>' );
$p->next_tag();
$p->get_attribute( 'data-enabled' );
$this->assertEquals( 'true', $p->get_attribute( 'DATA-enabled' ) );
$this->assertEquals( 'true', $p->get_attribute( 'data-enabled' ) );
$this->assertEquals( 'true', $p->get_attribute( 'DATA-ENABLED' ) );
```
@dmsnell dmsnell force-pushed the html-tag-processor-attributes-names-should-be-case-insensitive branch from e5f5217 to 048e84a Compare December 22, 2022 23:19
@dmsnell dmsnell force-pushed the html-tag-processor-attributes-names-should-be-case-insensitive branch from 048e84a to 46ecc4b Compare December 22, 2022 23:24
@dmsnell
Copy link
Copy Markdown
Member

dmsnell commented Dec 22, 2022

@adamziel I pushed a commit implementing the behavior I suggested. we can remove those commits if you think it's a mistake.

in addition I updated the comments to quote directly from the spec and linked to a different section which discusses the meaning of the attribute name instead of the description of how to parse them. figured that might be more normative.

Comment thread phpunit/html/wp-html-tag-processor-test.php Outdated
@ockham
Copy link
Copy Markdown
Contributor

ockham commented Jan 2, 2023

@dmsnell I wholeheartedly agree with your reasoning.

I've applied some minor polish (6bd2b93, b5babb1). I'll merge once CI goes green (unless there's any last-minute change requests) 😄

@ockham ockham merged commit ecc4f53 into trunk Jan 2, 2023
@ockham ockham deleted the html-tag-processor-attributes-names-should-be-case-insensitive branch January 2, 2023 11:01
@github-actions github-actions bot added this to the Gutenberg 14.9 milestone Jan 2, 2023
@dmsnell
Copy link
Copy Markdown
Member

dmsnell commented Jan 4, 2023

Thanks both of you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Developer Experience Ideas about improving block and theme developer experience [Type] Bug An existing feature does not function as intended

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants