Conversation
docs/en/users/10_filter.md
Outdated
| Additional reading: [De Morgan’s laws](https://en.wikipedia.org/wiki/De_Morgan%27s_laws). | ||
|
|
||
| > ℹ️ Searches are applied to the raw HTML content | ||
| > ℹ️ Searches are applied to the HTML content, and are automatically XML-encoded (so one can search for `'A & B'` without having to encode the `&`). |
There was a problem hiding this comment.
You call it "XML encoded" here, but "HTML encoded" down in the regex section. It's the same encoding, but XML is probably also relevant to mention down in the regex since searching even a plain text article is affected (I.e. match anplaintext title containing Q&A: needs to be done with /Q&A:/). The HTML example down there is also relevant though too.
There was a problem hiding this comment.
For the record, the title is also an HTML field, hence the same syntax
There was a problem hiding this comment.
For the record, the title is also an HTML field, hence the same syntax
If I'm not mistaken, it's supported, but up to the actual feed to decide whether it will specify it in plain text or HTML, correct? Plain text is just a subset of HTML with the exception of the 4 characters that have to be escaped, and those also have to be escaped for XML.
I guess I never thought too much about it, but I suppose HTML in the RSS XML probably isn't being double-escaped, is it?
There was a problem hiding this comment.
When we sanitize the title (and other text fields), the end result is always HTML. Otherwise we would not be able to display those different fields safely.
There was a problem hiding this comment.
Oh duh, or CDATA in the XML.
So when FreshRSS is importing the XML content, does it require CDATA sections for the title and content, or does it unwrap CDATA and decode non-CDATA fields?
There was a problem hiding this comment.
All that is handled by the sanitization / normalisation.
No description provided.