Skip to content

Fix support for feeds with XML preample + DTD, entities#915

Merged
jtojnar merged 2 commits intosimplepie:masterfrom
FreshRSS:fix-xml-dtd-entities
Jun 29, 2025
Merged

Fix support for feeds with XML preample + DTD, entities#915
jtojnar merged 2 commits intosimplepie:masterfrom
FreshRSS:fix-xml-dtd-entities

Conversation

@Alkarex
Copy link
Contributor

@Alkarex Alkarex commented Jun 23, 2025

Alternative of #914 with support for XML feeds using HTML entities

Simple feeds with XML preample + DTD are crashing (see example in test) due to regression from 162a3d3
Its implementation is buggy, as it creates a second DOCTYPE declaration even when there is an existing one, which is completely invalid.

Example of feed: https://blog.plover.com/index.rss
Downstream Issue: FreshRSS/FreshRSS#7514

Note that the culprit feature added support for html entities in xml seems to come from a misunderstanding of
https://www.rssboard.org/rss-encoding-examples
HTML entities are only allowed in XML when there is a DTD declaring them. SimplePie is even allowing Atom documents with undeclared HTML entities - which I am not sure is on purpose.

Downstream issue: FreshRSS/FreshRSS#7687

Alternative of simplepie#914 with support for XML feeds using HTML entities

Simple feeds with XML preample + DTD are crashing (see example in test) due to regression from simplepie@162a3d3
Its implementation is buggy, as it creates a second DOCTYPE declaration even when there is an existing one, which is completely invalid.

Example of feed: https://blog.plover.com/index.rss
Downstream Issue: FreshRSS/FreshRSS#7514
Downstream PR: #35

Note that the culprit feature *added support for html entities in xml* seems to come from a misunderstanding of
https://www.rssboard.org/rss-encoding-examples
HTML entities are only allowed in XML when there is a DTD declaring them. SimplePie is even allowing Atom documents with undeclared HTML entities - which I am not sure is on purpose.

Downstream issue: FreshRSS/FreshRSS#7687
@Alkarex Alkarex force-pushed the fix-xml-dtd-entities branch from a0add8f to df3b0dd Compare June 23, 2025 23:00
Alkarex added a commit to FreshRSS/simplepie that referenced this pull request Jun 24, 2025
Upstream: simplepie#915

Alternative of simplepie#914 with support for XML feeds using HTML entities

Simple feeds with XML preample + DTD are crashing (see example in test) due to regression from simplepie@162a3d3 Its implementation is buggy, as it creates a second DOCTYPE declaration even when there is an existing one, which is completely invalid.

Example of feed: https://blog.plover.com/index.rss
Downstream Issue: FreshRSS/FreshRSS#7514

Note that the culprit feature *added support for html entities in xml* seems to come from a misunderstanding of https://www.rssboard.org/rss-encoding-examples
HTML entities are only allowed in XML when there is a DTD declaring them. SimplePie is even allowing Atom documents with undeclared HTML entities - which I am not sure is on purpose.

Downstream issue: FreshRSS/FreshRSS#7687
Alkarex added a commit to Alkarex/FreshRSS that referenced this pull request Jun 24, 2025
fix FreshRSS#7687

FreshRSS/simplepie#37
Upstream: simplepie/simplepie#915

Partial revert of FreshRSS#7515

HTML entities are normally only allowed in XML when there is a DTD declaring them. SimplePie is even allowing Atom documents with undeclared HTML entities - which I am not sure is on purpose.
Comment on lines +4317 to +4325
<?xml version="1.0"?>
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://www.rssboard.org/rss-0.91.dtd">
<rss version="0.91">
<channel>
<item>
<title>Item Title</title>
</item>
</channel>
</rss>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This simple feed could not be parsed

Alkarex added a commit to FreshRSS/FreshRSS that referenced this pull request Jun 24, 2025
fix #7687

FreshRSS/simplepie#37
Upstream: simplepie/simplepie#915

Partial revert of #7515

HTML entities are normally only allowed in XML when there is a DTD declaring them. SimplePie is even allowing Atom documents with undeclared HTML entities - which I am not sure is on purpose.
Copy link
Member

@jtojnar jtojnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks reasonable and the tests are great, just a single nit.

@jtojnar jtojnar added this to the 1.9.0 milestone Jun 29, 2025
,
'Item Title',
],
'Test RSS 0.91 XML and DTD and external HTML entity' => [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTML entity

The RSS 0.91 DTD actually copies the entity list from HTML 3.2 and appears to be a subset of declare_html_entities.

<rss version="0.91">
<channel>
<item>
<title>caf&eacute;</title>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weirdly, the entities from declare_html_entities appear to be substituted, even though the DOCTYPE within uses html as root element. 🤷‍♀️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was also surprised this worked at all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed to rss

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XML
,
'café',
],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, the following should work as well but it is probably to niche to care about:

Suggested change
],
],
'Test RSS 0.91 DTD and external HTML entity and internal entity' => [
<<<XML
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://www.rssboard.org/rss-0.91.dtd" [
<!ENTITY adios "Adi&oacute;s">
]>
<rss version="0.91">
<channel>
<item>
<title>&adios; Adi&oacute;s</title>
</item>
</channel>
</rss>
XML
,
'Adiós Adiós',
],

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, but merging DTDs do not look easy as they may contain more complicated syntax

@pull-request-size pull-request-size bot added size/L and removed size/M labels Jun 29, 2025
@jtojnar jtojnar merged commit 0340309 into simplepie:master Jun 29, 2025
10 checks passed
@jtojnar
Copy link
Member

jtojnar commented Jun 29, 2025

Thanks.

alexlebens pushed a commit to alexlebens/infrastructure that referenced this pull request Aug 20, 2025
This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [freshrss/freshrss](https://freshrss.org/) ([source](https://github.com/FreshRSS/FreshRSS)) | minor | `1.26.3` -> `1.27.0` |

---

### Release Notes

<details>
<summary>FreshRSS/FreshRSS (freshrss/freshrss)</summary>

### [`v1.27.0`](https://github.com/FreshRSS/FreshRSS/blob/HEAD/CHANGELOG.md#2025-08-18-FreshRSS-1270)

[Compare Source](FreshRSS/FreshRSS@1.26.3...1.27.0)

- Features
  - Implement support for HTTP `429 Too Many Requests` and `503 Service Unavailable`, obey `Retry-After` [#&#8203;7760](FreshRSS/FreshRSS#7760)
  - Add sort by category title, or by feed title [#&#8203;7702](FreshRSS/FreshRSS#7702)
  - Add search operator `c:` for categories like `c:23,34` or `!c:45,56` [#&#8203;7696](FreshRSS/FreshRSS#7696)
  - Custom feed favicons [#&#8203;7646](FreshRSS/FreshRSS#7646), [#&#8203;7704](FreshRSS/FreshRSS#7704), [#&#8203;7717](FreshRSS/FreshRSS#7717),
    [#&#8203;7792](FreshRSS/FreshRSS#7792)
  - Rework fetch favicons for fewer HTTP requests [#&#8203;7767](FreshRSS/FreshRSS#7767)
  - Add more unicity criteria based on title and/or content [#&#8203;7789](FreshRSS/FreshRSS#7789)
  - Automatically restore user configuration from backup [#&#8203;7682](FreshRSS/FreshRSS#7682)
  - API add support for states in `s` parameter of `streamId` [#&#8203;7695](FreshRSS/FreshRSS#7695)
  - Improve sharing via Print [#&#8203;7728](FreshRSS/FreshRSS#7728)
  - Redirect to the login page from bookmarklet instead of 403 [#&#8203;7782](FreshRSS/FreshRSS#7782)
  - Clean local cache more often, when refreshing feeds [#&#8203;7827](FreshRSS/FreshRSS#7827)
- Security
  - Implement reauthentication (*sudo* mode) [#&#8203;7753](FreshRSS/FreshRSS#7753)
  - Add `Content-Security-Policy: frame-ancestors` [#&#8203;7677](FreshRSS/FreshRSS#7677)
  - Ensure CSP everywhere [#&#8203;7810](FreshRSS/FreshRSS#7810)
  - Show warning when unsafe CSP policy is in use [#&#8203;7804](FreshRSS/FreshRSS#7804)
  - Fix access rights when creating a new user [#&#8203;7783](FreshRSS/FreshRSS#7783)
  - Improve security of form for user details [#&#8203;7771](FreshRSS/FreshRSS#7771), [#&#8203;7786](FreshRSS/FreshRSS#7786)
  - Disallow setting non-existent theme [#&#8203;7722](FreshRSS/FreshRSS#7722)
  - Regenerate cookie ID after logging out [#&#8203;7762](FreshRSS/FreshRSS#7762)
  - Require current password when setting new password [#&#8203;7763](FreshRSS/FreshRSS#7763)
  - Add missing access checks for feed-related actions [#&#8203;7768](FreshRSS/FreshRSS#7768)
  - Strip more unsafe attributes such as `referrerpolicy`, `ping` [#&#8203;7770](FreshRSS/FreshRSS#7770)
  - Remove unneeded execution permissions [#&#8203;7802](FreshRSS/FreshRSS#7802)
- Bug fixing
  - Fix redirections when scraping from HTML [#&#8203;7654](FreshRSS/FreshRSS#7654), [#&#8203;7741](FreshRSS/FreshRSS#7741)
  - Fix multiple authentication HTTP headers [#&#8203;7703](FreshRSS/FreshRSS#7703)
  - Fix HTML queries with a single feed [#&#8203;7730](FreshRSS/FreshRSS#7730)
  - WebSub: only perform a redirection when coming from WebSub [#&#8203;7738](FreshRSS/FreshRSS#7738)
  - Include enclosures in entries’ hash [#&#8203;7719](FreshRSS/FreshRSS#7719)
    - Negative side-effect: users of the option to *automatically mark updated articles as unread* will once have some articles with enclosures re-appear as unread
  - Fix cancellation of slider exit UI [#&#8203;7705](FreshRSS/FreshRSS#7705)
  - Honor *disable update* on update page [#&#8203;7733](FreshRSS/FreshRSS#7733)
  - Fix no registration limit setting [#&#8203;7751](FreshRSS/FreshRSS#7751)
  - Fix XML encoding of sharing functions [#&#8203;7822](FreshRSS/FreshRSS#7822)
- SimplePie
  - Fix propagation of HTTP error codes [#&#8203;7670](FreshRSS/FreshRSS#7670)
  - Fix support for XML feeds with HTML entities [#&#8203;7689](FreshRSS/FreshRSS#7689), [simplepie#915](simplepie/simplepie#915)
  - Fix feeds encoded in UTF-16LE [#&#8203;7691](FreshRSS/FreshRSS#7691), [simplepie#916](simplepie/simplepie#916)
  - Various upstream contributions [simplepie#917](simplepie/simplepie#917), [simplepie#924](simplepie/simplepie#924),
    [simplepie#926](simplepie/simplepie#926), [simplepie#932](simplepie/simplepie#932), [simplepie#933](simplepie/simplepie#933)
  - Sync upstream [#&#8203;7706](FreshRSS/FreshRSS#7706), [FreshRSS/simplepie#45](FreshRSS/simplepie#45), [#&#8203;7775](FreshRSS/FreshRSS#7775),
    [FreshRSS/simplepie#50](FreshRSS/simplepie#50), [#&#8203;7824](FreshRSS/FreshRSS#7824), [#&#8203;7825](FreshRSS/FreshRSS#7825),
  - Fix regex *Backtrack limit was exhausted* in `clean_hash()` [#&#8203;7813](FreshRSS/FreshRSS#7813), [FreshRSS/simplepie#48](FreshRSS/simplepie#48)
- Deployment
  - Docker default image (Debian 12 Bookworm) updated to PHP 8.2.29 [#&#8203;7805](FreshRSS/FreshRSS#7805)
  - Docker alternative image updated to Alpine 3.22 with PHP 8.4.11 and Apache 2.4.65 [#&#8203;7740](FreshRSS/FreshRSS#7740), [#&#8203;7740](FreshRSS/FreshRSS#7740),
    [#&#8203;7803](FreshRSS/FreshRSS#7803)
  - Start supporting PHP 8.5+ [#&#8203;7787](FreshRSS/FreshRSS#7787), [#&#8203;7826](FreshRSS/FreshRSS#7826)
    - Docker Alpine dev image `:newest` updated to PHP 8.5-alpha and Apache 2.4.65 [#&#8203;7773](FreshRSS/FreshRSS#7773)
  - Docker: interpolate `FRESHRSS_INSTALL` and `FRESHRSS_USER` variables [#&#8203;7725](FreshRSS/FreshRSS#7725)
  - Docker: Reduce how much data needs to be chown/chmod’ed on container startup [#&#8203;7793](FreshRSS/FreshRSS#7793)
  - Test for database PDO typing support during install (relevant for MySQL / MariaDB with obsolete driver) [#&#8203;7651](FreshRSS/FreshRSS#7651)
- Extensions
  - Add API endpoint for extensions [#&#8203;7576](FreshRSS/FreshRSS#7576)
  - Expose the reading modes for extensions [#&#8203;7668](FreshRSS/FreshRSS#7668), [#&#8203;7688](FreshRSS/FreshRSS#7688)
  - New extension hook `before_login_btn` [#&#8203;7761](FreshRSS/FreshRSS#7761)
- UI
  - Improve *mark as read* request showing popup due to `onbeforeunload` [#&#8203;7554](FreshRSS/FreshRSS#7554)
  - Fix lazy-loading for `<video poster="...">` and `<image>` [#&#8203;7636](FreshRSS/FreshRSS#7636)
  - Avoid styling `<code>` inside of `<pre>` [#&#8203;7797](FreshRSS/FreshRSS#7797)
  - Improve confirmation logic with `data-auto-leave-validation` [#&#8203;7785](FreshRSS/FreshRSS#7785)
  - Update `chart.js` to 4.5.0 [#&#8203;7752](FreshRSS/FreshRSS#7752), [#&#8203;7816](FreshRSS/FreshRSS#7816)
  - Various UI and style improvements: [#&#8203;7616](FreshRSS/FreshRSS#7616), [#&#8203;7811](FreshRSS/FreshRSS#7811)
- I18n
  - Show translation status in README [#&#8203;7715](FreshRSS/FreshRSS#7715)
  - Improve Indonesian [#&#8203;7654](FreshRSS/FreshRSS#7654), [#&#8203;7721](FreshRSS/FreshRSS#7721)
  - Improve Persian [#&#8203;7795](FreshRSS/FreshRSS#7795)
- Misc.
  - Improve PHP code [#&#8203;7642](FreshRSS/FreshRSS#7642), [#&#8203;7665](FreshRSS/FreshRSS#7665), [#&#8203;7761](FreshRSS/FreshRSS#7761),
    [#&#8203;7781](FreshRSS/FreshRSS#7781), [#&#8203;7794](FreshRSS/FreshRSS#7794)
  - Update dev dependencies [#&#8203;7708](FreshRSS/FreshRSS#7708), [#&#8203;7709](FreshRSS/FreshRSS#7709), [#&#8203;7710](FreshRSS/FreshRSS#7710),
    [#&#8203;7711](FreshRSS/FreshRSS#7711), [#&#8203;7776](FreshRSS/FreshRSS#7776), [#&#8203;7777](FreshRSS/FreshRSS#7777)

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0MS4zNS4wIiwidXBkYXRlZEluVmVyIjoiNDEuMzUuMCIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiaW1hZ2UiXX0=-->

Reviewed-on: https://gitea.alexlebens.dev/alexlebens/infrastructure/pulls/1253
Co-authored-by: Renovate Bot <renovate-bot@alexlebens.net>
Co-committed-by: Renovate Bot <renovate-bot@alexlebens.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants