Conversation
index.bs
Outdated
| 1. If |configuration|["{{SanitizerConfig/elements}}"] [=map/exists=] and | ||
| |configuration|["{{SanitizerConfig/elements}}"]["{{SanitizerElementNamespaceWithAttributes/removeAttributes}}"] | ||
| [=map/exists=] and [=SanitizerConfig/contains=] |attrName|, |
There was a problem hiding this comment.
This is slightly ambiguous. Perhaps a comma before the first "and" removes it. (A more verbose version that also works would have you repeat what contains attrName and remove the duplicate "and".)
There was a problem hiding this comment.
Done, by seperating the clauses with , and if.
|
Indirectly related to this, but I don't want to forget about this. Don't we also need to handle missing properties in set a configuration when handling the input config? We just access Similarly in allow an element etc. we are just appending/removing from potentially missing lists. |
|
Oh, and the way it's currently specified, we can't actually tell the difference between Both cases don't enter the for each, so we can't tell the difference. I guess we need to do something like: For the other lists (removeElements etc.) we should maybe just always explicitly initialize the sanitizer's configuration to an empty list? |
|
Thanks for the review!
Fixed this in the latest push. (Other fixes are still coming...) |
|
"set a configuration" now explicitly initializes any list that exists in the input. So absent and empty lists should now be accurately reflected in the internal config. |
I would have actually preferred to always initialize everything but elements/attributes, but I can also totally see how only having those be optional would be confusing. (It does emphasize the difference between missing and empty being special for these though ...) I think we are still missing an initialization of "attributes" and "remove attributes" in canonicalize a sanitizer element with attributes. |
index.bs
Outdated
| 1. If all of the following are false, | ||
| then [=/remove an attribute|remove=] |attribute| from |child|. | ||
| - |configuration|["{{SanitizerConfig/attributes}}"] [=map/exists=] and | ||
| does not [=SanitizerConfig/contain=] |attrName| |
There was a problem hiding this comment.
Isn't "does not" wrong? If the attribute is contained in SanitizerConfig/attributes (i.e. it should be allowed) this expression would result in false, which would mean removal.
This step might be the most complicated in the whole specification, we should try to brainstorm something to simplify it.
There was a problem hiding this comment.
Isn't "does not" wrong? If the attribute is contained in SanitizerConfig/attributes (i.e. it should be allowed) this expression would result in
false, which would mean removal.
Yes, I think so. I'll update the PR.
This step might be the most complicated in the whole specification, we should try to brainstorm something to simplify it.
I also agree. I tried to keep it simple, but... this is the result. :-)
Alternatives or improvements I can think of:
- Be more aggressive about factoring out common sub-expression. In particular,
if map/exists and SanitizerConfig/containsoccurs a lot. - This is basically an if-then with 3 sub-conditions. One could either move all the conditions into a seperate algorithm. (Which I'm usually not that fond of, because that mostly only introduces an indirection, rather than a proper abstraction.)
- We could resolve the 3-condition if-then into three seperate if-thens. That would repeat the action 3 times, though.
- We could resolve the 3-condition if-then into three seperate if-thens, which skip over the remaining steps of this sequence of steps. I.e., "If [...], then skip over the remaining steps in this sequence," sort of like
continueorbreakdoes for loops. The infra spec defines suitable languages for loops, but not for overly long if-thens. I think we can just describe it in English, though.
wdyt?
There was a problem hiding this comment.
I would make at least one change. Write it like
If all of the following are false:
- a is true;
- b is true; and
- c is true,
then ...
to make it flow a bit better (and be more aligned with Infra). (Might be harder with Markdown, not sure.)
There was a problem hiding this comment.
I followed the 'flow' suggestion here, and simplified the expressions a little bit by merging existance check + lookup into one operation.
index.bs
Outdated
| 1. [=Continue=]. | ||
| 1. If |configuration|["{{SanitizerConfig/removeElements}}"] [=SanitizerConfig/contains=] |elementName|, or if |configuration|["{{SanitizerConfig/elements}}"] is not [=list/empty=] and does not [=SanitizerConfig/contain=] |elementName|: | ||
| 1. If |configuration|["{{SanitizerConfig/removeElements}}"] [=map/exists=] and | ||
| [=SanitizerConfig/contains=] |elementName|, or if |
There was a problem hiding this comment.
I see you sometimes use a nested "if". The usual specification style does not have that.
There was a problem hiding this comment.
Removed "or if" and "and if".
index.bs
Outdated
| 1. If all of the following are false, | ||
| then [=/remove an attribute|remove=] |attribute| from |child|. | ||
| - |configuration|["{{SanitizerConfig/attributes}}"] [=map/exists=] and | ||
| does not [=SanitizerConfig/contain=] |attrName| |
There was a problem hiding this comment.
I would make at least one change. Write it like
If all of the following are false:
- a is true;
- b is true; and
- c is true,
then ...
to make it flow a bit better (and be more aligned with Infra). (Might be harder with Markdown, not sure.)
index.bs
Outdated
| 1. [=SanitizerConfig/Remove=] |element| from the "{{SanitizerConfig/elements}}" | ||
| key of |configuration|. |
There was a problem hiding this comment.
This notation is very confusing. Should this be
Remove element from configuration["elements"]
?
There was a problem hiding this comment.
It is confusing indeed.
The problem is that all those keys are optional and might not be there. The intent is that I can move the existence check into the algorithm that get's called: If configuration doesn't actually have an "elements", then Remove element from configuration["elements"] would run into the assertion at the beginning of [=map/get the value of an entry=]. Here, I handle this by moving the map operation into the algorithm, and can do the existence-check at the beginning of remove.
One alternative would be to just do the existence check plus default value at every calling site. That strikes me as very repetetive.
Ideal would be if there'd be an a[b] operator that would return None (or somesuch). Then the calling algorithm could just treat None in whichever fashion it likes.
There was a problem hiding this comment.
I've prepared a version based on the latter, based on an a[b,default] operation. Not sure how much that improves things, and whether that's at all palatable for eventual inclusion into HTML. On the pro side, it helps with this problem; with the size problem below, and occurs again in the core algorithm issue above. On the con side, I'm inventing another new notation. :-/
Please let my know what you all think. I'm still unsure how to proceed.
There was a problem hiding this comment.
I don't think I like that size can return None. We should probably do something more explicit for "set a configuration"?
Like, roughly:
Did list change?
- If a and b is some, return number of items in a and b are identical.
- Return true if both a and b are none
- Return false
index.bs
Outdated
| <div algorithm> | ||
| To determine the <dfn for="SanitizerConfig">size</dfn> of a [=/list=] |list|: | ||
|
|
||
| 1. If |list| does not [=map/exist=], return None. |
There was a problem hiding this comment.
Using map/exists on a list doesn't make sense.
There was a problem hiding this comment.
Yes. The intent was that this gets called with configuration["bla"] and I'm hoisting the existance check into the caller. I don't think that was valid. I've now prepared a version that's explicit.
Add WPT testcases for parseHTML + parseHTMLUnsafe. Fix <!DOCTYPE> handling, as a follow-up to issue #288 / PR #285. Previously, it was assumed Doc-Type nodes couldn't make it into the sanitizer implementation; but with the change to parseHTML context they can. Spec: WICG/sanitizer-api#288 Spec: WICG/sanitizer-api#285 Bug: 356601280 Change-Id: Iec797b929eb3b90d5f08318bb28964d2f683acb4 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6523521 Reviewed-by: Yifan Luo <lyf@chromium.org> Commit-Queue: Daniel Vogelheim <vogelheim@chromium.org> Cr-Commit-Position: refs/heads/main@{#1457595}
Add WPT testcases for parseHTML + parseHTMLUnsafe. Fix <!DOCTYPE> handling, as a follow-up to issue #288 / PR #285. Previously, it was assumed Doc-Type nodes couldn't make it into the sanitizer implementation; but with the change to parseHTML context they can. Spec: WICG/sanitizer-api#288 Spec: WICG/sanitizer-api#285 Bug: 356601280 Change-Id: Iec797b929eb3b90d5f08318bb28964d2f683acb4 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6523521 Reviewed-by: Yifan Luo <lyf@chromium.org> Commit-Queue: Daniel Vogelheim <vogelheim@chromium.org> Cr-Commit-Position: refs/heads/main@{#1457595}
Add WPT testcases for parseHTML + parseHTMLUnsafe. Fix <!DOCTYPE> handling, as a follow-up to issue #288 / PR #285. Previously, it was assumed Doc-Type nodes couldn't make it into the sanitizer implementation; but with the change to parseHTML context they can. Spec: WICG/sanitizer-api#288 Spec: WICG/sanitizer-api#285 Bug: 356601280 Change-Id: Iec797b929eb3b90d5f08318bb28964d2f683acb4 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6523521 Reviewed-by: Yifan Luo <lyf@chromium.org> Commit-Queue: Daniel Vogelheim <vogelheim@chromium.org> Cr-Commit-Position: refs/heads/main@{#1457595}
|
Could we move the DocumentType change to another PR? |
| <div algorithm> | ||
| To <dfn for="map">maybe get the value</dfn> in an [=ordered map=] | ||
| |map| given a [=map/key=] |key| and an optional |default|: | ||
|
|
||
| 1. If |default| is given and |map| does not [=map/contain=] |key|, | ||
| return |default|. | ||
| 1. Return [=get the value=] in |map| given |key|. | ||
|
|
||
| </div> | ||
|
|
||
| We can also denote [=maybe get the value|maybe getting the value=] using an indexing syntax with two parameters, by providing a [=map/key=] and a default value seperated by comma, inside square brackets directly following a map. | ||
|
|
||
| <div class=example> | ||
| `map["test",None]` would return | ||
| `"ok"` for a [=map=] «["test"→"ok"]», | ||
| and `None` for a [=map=] «["ok"→"test"]». | ||
| </div> |
There was a problem hiding this comment.
I think this is a great idea, but let's discuss this over in https://github.com/whatwg/infra. I suppose we could merge this for now, but I think this needs a bit more scrutiny and wider buy-in before we use it in a standard.
| 1. If |configuration|["{{SanitizerConfig/replaceWithChildrenElements}}"] | ||
| [=map/exists=] and [=SanitizerConfig/contains=] |elementName|: |
There was a problem hiding this comment.
I wonder if it would be better to define an operation for these kind of checks.
In particular these contains checks seem somewhat problematic as they don't perform a by-reference comparison but expect a by-value comparison. But we haven't really nailed down comparisons in Infra yet and even the sketch we have at whatwg/infra#664 doesn't want it to be used for maps as you are doing throughout the document.
| 1. [=Assert=]: |list| is a [=/list=] or None. | ||
| 1. If |list| is None, return. |
There was a problem hiding this comment.
Why None and not null? Null is the usual value for this kind of thing. See also Infra.
Done: #289 Will also remove those bits from this PR. |
|
(Sorry, All these comments are getting confusing) Coming back to dreaded attributes removal condition. I still don't think it really works. Imagine 1. Edit: I think I would recommend refactoring this to another pseudo method like this: Should remove attribute with configuration, elementName and attrName?
Mhm. I guess that definition makes it impossible to define a strict list of attributes per element with just |
… fix DOCTYPE handling., Automatic update from web-platform-tests [Sanitizer] Add parseHTML testcases, and fix DOCTYPE handling. Add WPT testcases for parseHTML + parseHTMLUnsafe. Fix <!DOCTYPE> handling, as a follow-up to issue #288 / PR #285. Previously, it was assumed Doc-Type nodes couldn't make it into the sanitizer implementation; but with the change to parseHTML context they can. Spec: WICG/sanitizer-api#288 Spec: WICG/sanitizer-api#285 Bug: 356601280 Change-Id: Iec797b929eb3b90d5f08318bb28964d2f683acb4 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6523521 Reviewed-by: Yifan Luo <lyf@chromium.org> Commit-Queue: Daniel Vogelheim <vogelheim@chromium.org> Cr-Commit-Position: refs/heads/main@{#1457595} -- wpt-commits: 95a60666dc001951821439078be97c0bcb421476 wpt-pr: 52406 Differential Revision: https://phabricator.services.mozilla.com/D249335
… fix DOCTYPE handling., Automatic update from web-platform-tests [Sanitizer] Add parseHTML testcases, and fix DOCTYPE handling. Add WPT testcases for parseHTML + parseHTMLUnsafe. Fix <!DOCTYPE> handling, as a follow-up to issue #288 / PR #285. Previously, it was assumed Doc-Type nodes couldn't make it into the sanitizer implementation; but with the change to parseHTML context they can. Spec: WICG/sanitizer-api#288 Spec: WICG/sanitizer-api#285 Bug: 356601280 Change-Id: Iec797b929eb3b90d5f08318bb28964d2f683acb4 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6523521 Reviewed-by: Yifan Luo <lyf@chromium.org> Commit-Queue: Daniel Vogelheim <vogelheim@chromium.org> Cr-Commit-Position: refs/heads/main@{#1457595} -- wpt-commits: 95a60666dc001951821439078be97c0bcb421476 wpt-pr: 52406 Differential Revision: https://phabricator.services.mozilla.com/D249335
… fix DOCTYPE handling., Automatic update from web-platform-tests [Sanitizer] Add parseHTML testcases, and fix DOCTYPE handling. Add WPT testcases for parseHTML + parseHTMLUnsafe. Fix <!DOCTYPE> handling, as a follow-up to issue #288 / PR #285. Previously, it was assumed Doc-Type nodes couldn't make it into the sanitizer implementation; but with the change to parseHTML context they can. Spec: WICG/sanitizer-api#288 Spec: WICG/sanitizer-api#285 Bug: 356601280 Change-Id: Iec797b929eb3b90d5f08318bb28964d2f683acb4 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6523521 Reviewed-by: Yifan Luo <lyfchromium.org> Commit-Queue: Daniel Vogelheim <vogelheimchromium.org> Cr-Commit-Position: refs/heads/main{#1457595} -- wpt-commits: 95a60666dc001951821439078be97c0bcb421476 wpt-pr: 52406 Differential Revision: https://phabricator.services.mozilla.com/D249335 UltraBlame original commit: 7033756c58b04d96f527d92e4b01d6ba3d4542e6
… fix DOCTYPE handling., Automatic update from web-platform-tests [Sanitizer] Add parseHTML testcases, and fix DOCTYPE handling. Add WPT testcases for parseHTML + parseHTMLUnsafe. Fix <!DOCTYPE> handling, as a follow-up to issue #288 / PR #285. Previously, it was assumed Doc-Type nodes couldn't make it into the sanitizer implementation; but with the change to parseHTML context they can. Spec: WICG/sanitizer-api#288 Spec: WICG/sanitizer-api#285 Bug: 356601280 Change-Id: Iec797b929eb3b90d5f08318bb28964d2f683acb4 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6523521 Reviewed-by: Yifan Luo <lyfchromium.org> Commit-Queue: Daniel Vogelheim <vogelheimchromium.org> Cr-Commit-Position: refs/heads/main{#1457595} -- wpt-commits: 95a60666dc001951821439078be97c0bcb421476 wpt-pr: 52406 Differential Revision: https://phabricator.services.mozilla.com/D249335 UltraBlame original commit: 7033756c58b04d96f527d92e4b01d6ba3d4542e6
… fix DOCTYPE handling., Automatic update from web-platform-tests [Sanitizer] Add parseHTML testcases, and fix DOCTYPE handling. Add WPT testcases for parseHTML + parseHTMLUnsafe. Fix <!DOCTYPE> handling, as a follow-up to issue #288 / PR #285. Previously, it was assumed Doc-Type nodes couldn't make it into the sanitizer implementation; but with the change to parseHTML context they can. Spec: WICG/sanitizer-api#288 Spec: WICG/sanitizer-api#285 Bug: 356601280 Change-Id: Iec797b929eb3b90d5f08318bb28964d2f683acb4 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6523521 Reviewed-by: Yifan Luo <lyfchromium.org> Commit-Queue: Daniel Vogelheim <vogelheimchromium.org> Cr-Commit-Position: refs/heads/main{#1457595} -- wpt-commits: 95a60666dc001951821439078be97c0bcb421476 wpt-pr: 52406 Differential Revision: https://phabricator.services.mozilla.com/D249335 UltraBlame original commit: 7033756c58b04d96f527d92e4b01d6ba3d4542e6
This re-writes config handling and sanitize-core based on 2025-05-28 meeting discussion. This would replace WICG#285.
|
I think we can close this in favor of #296 already. |
Re-work the handling of empty vs missing lists, along the lines discussed in #268. Empty allow-lists now effectively block everything; while a missing allow-list does nothing. For remove-lists, there is no difference and neither form will remove anything.
Examples:
{}=> everything passes{elements: []}=> nothing passesThe error handling is updated to disallow both allow and block lists (for elements & attributes, each). For per-element attributes no such restriction exists; unlike what #281 wants.
Example:
{ attributes: ["src", "srcset", "alt"], elements: [{name: "img", attributes: ["width", "height"], removeAttributes: ["srcset"]}]}This would allow
<img src=... alt=... width=... height=...>, but would drop asrcsetattribute from the element.Fixes #268 and #281
Preview | Diff