Skip to content

Handle redirects when scraping feed from HTML#7654

Merged
Alkarex merged 7 commits intoFreshRSS:edgefrom
Inverle:fix-relative-link
Jun 21, 2025
Merged

Handle redirects when scraping feed from HTML#7654
Alkarex merged 7 commits intoFreshRSS:edgefrom
Inverle:fix-relative-link

Conversation

@Inverle
Copy link
Member

@Inverle Inverle commented Jun 5, 2025

Fixes #6991

It wasn't possible to correctly scrape https://blog.thea.codes/feed.xml because the entry URLs didn't have a trailing slash in them.
If you opened one of the entries in a browser, it would automatically redirect you to a URL with a trailing slash in it.

e.g. https://blog.thea.codes/my-2024 -> https://blog.thea.codes/my-2024/

In this PR, redirects are now handled properly, and the entry URLs are replaced with the redirected versions of them.

Inverle and others added 3 commits June 6, 2025 14:25
which should not be overriden
@Alkarex Alkarex merged commit 18b5c8e into FreshRSS:edge Jun 21, 2025
1 check passed
@Inverle Inverle deleted the fix-relative-link branch July 7, 2025 15:38
pull bot pushed a commit to lilikimi/FreshRSS that referenced this pull request Jul 16, 2025
)

Follow-up to FreshRSS#7654 (comment)

Changes proposed in this pull request:

- `DOMNode::insertBefore()` needs to be called on an element that is the parent of the `$child` param being passed
- Update code to call this on `$doc->documentElement` instead of directly on the `$doc` (`DOMDocument`)

How to test the feature manually:

1. Set up an HTML + XPath feed for a URL that contains partial HTML content (eg. https://victoria.citified.ca/modules/blog/news.php?n=7&c=8)
1. Observe that the feed is processed successfully without error, and that the `<base>` is still inserted
alexlebens pushed a commit to alexlebens/infrastructure that referenced this pull request Aug 20, 2025
This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [freshrss/freshrss](https://freshrss.org/) ([source](https://github.com/FreshRSS/FreshRSS)) | minor | `1.26.3` -> `1.27.0` |

---

### Release Notes

<details>
<summary>FreshRSS/FreshRSS (freshrss/freshrss)</summary>

### [`v1.27.0`](https://github.com/FreshRSS/FreshRSS/blob/HEAD/CHANGELOG.md#2025-08-18-FreshRSS-1270)

[Compare Source](FreshRSS/FreshRSS@1.26.3...1.27.0)

- Features
  - Implement support for HTTP `429 Too Many Requests` and `503 Service Unavailable`, obey `Retry-After` [#&#8203;7760](FreshRSS/FreshRSS#7760)
  - Add sort by category title, or by feed title [#&#8203;7702](FreshRSS/FreshRSS#7702)
  - Add search operator `c:` for categories like `c:23,34` or `!c:45,56` [#&#8203;7696](FreshRSS/FreshRSS#7696)
  - Custom feed favicons [#&#8203;7646](FreshRSS/FreshRSS#7646), [#&#8203;7704](FreshRSS/FreshRSS#7704), [#&#8203;7717](FreshRSS/FreshRSS#7717),
    [#&#8203;7792](FreshRSS/FreshRSS#7792)
  - Rework fetch favicons for fewer HTTP requests [#&#8203;7767](FreshRSS/FreshRSS#7767)
  - Add more unicity criteria based on title and/or content [#&#8203;7789](FreshRSS/FreshRSS#7789)
  - Automatically restore user configuration from backup [#&#8203;7682](FreshRSS/FreshRSS#7682)
  - API add support for states in `s` parameter of `streamId` [#&#8203;7695](FreshRSS/FreshRSS#7695)
  - Improve sharing via Print [#&#8203;7728](FreshRSS/FreshRSS#7728)
  - Redirect to the login page from bookmarklet instead of 403 [#&#8203;7782](FreshRSS/FreshRSS#7782)
  - Clean local cache more often, when refreshing feeds [#&#8203;7827](FreshRSS/FreshRSS#7827)
- Security
  - Implement reauthentication (*sudo* mode) [#&#8203;7753](FreshRSS/FreshRSS#7753)
  - Add `Content-Security-Policy: frame-ancestors` [#&#8203;7677](FreshRSS/FreshRSS#7677)
  - Ensure CSP everywhere [#&#8203;7810](FreshRSS/FreshRSS#7810)
  - Show warning when unsafe CSP policy is in use [#&#8203;7804](FreshRSS/FreshRSS#7804)
  - Fix access rights when creating a new user [#&#8203;7783](FreshRSS/FreshRSS#7783)
  - Improve security of form for user details [#&#8203;7771](FreshRSS/FreshRSS#7771), [#&#8203;7786](FreshRSS/FreshRSS#7786)
  - Disallow setting non-existent theme [#&#8203;7722](FreshRSS/FreshRSS#7722)
  - Regenerate cookie ID after logging out [#&#8203;7762](FreshRSS/FreshRSS#7762)
  - Require current password when setting new password [#&#8203;7763](FreshRSS/FreshRSS#7763)
  - Add missing access checks for feed-related actions [#&#8203;7768](FreshRSS/FreshRSS#7768)
  - Strip more unsafe attributes such as `referrerpolicy`, `ping` [#&#8203;7770](FreshRSS/FreshRSS#7770)
  - Remove unneeded execution permissions [#&#8203;7802](FreshRSS/FreshRSS#7802)
- Bug fixing
  - Fix redirections when scraping from HTML [#&#8203;7654](FreshRSS/FreshRSS#7654), [#&#8203;7741](FreshRSS/FreshRSS#7741)
  - Fix multiple authentication HTTP headers [#&#8203;7703](FreshRSS/FreshRSS#7703)
  - Fix HTML queries with a single feed [#&#8203;7730](FreshRSS/FreshRSS#7730)
  - WebSub: only perform a redirection when coming from WebSub [#&#8203;7738](FreshRSS/FreshRSS#7738)
  - Include enclosures in entries’ hash [#&#8203;7719](FreshRSS/FreshRSS#7719)
    - Negative side-effect: users of the option to *automatically mark updated articles as unread* will once have some articles with enclosures re-appear as unread
  - Fix cancellation of slider exit UI [#&#8203;7705](FreshRSS/FreshRSS#7705)
  - Honor *disable update* on update page [#&#8203;7733](FreshRSS/FreshRSS#7733)
  - Fix no registration limit setting [#&#8203;7751](FreshRSS/FreshRSS#7751)
  - Fix XML encoding of sharing functions [#&#8203;7822](FreshRSS/FreshRSS#7822)
- SimplePie
  - Fix propagation of HTTP error codes [#&#8203;7670](FreshRSS/FreshRSS#7670)
  - Fix support for XML feeds with HTML entities [#&#8203;7689](FreshRSS/FreshRSS#7689), [simplepie#915](simplepie/simplepie#915)
  - Fix feeds encoded in UTF-16LE [#&#8203;7691](FreshRSS/FreshRSS#7691), [simplepie#916](simplepie/simplepie#916)
  - Various upstream contributions [simplepie#917](simplepie/simplepie#917), [simplepie#924](simplepie/simplepie#924),
    [simplepie#926](simplepie/simplepie#926), [simplepie#932](simplepie/simplepie#932), [simplepie#933](simplepie/simplepie#933)
  - Sync upstream [#&#8203;7706](FreshRSS/FreshRSS#7706), [FreshRSS/simplepie#45](FreshRSS/simplepie#45), [#&#8203;7775](FreshRSS/FreshRSS#7775),
    [FreshRSS/simplepie#50](FreshRSS/simplepie#50), [#&#8203;7824](FreshRSS/FreshRSS#7824), [#&#8203;7825](FreshRSS/FreshRSS#7825),
  - Fix regex *Backtrack limit was exhausted* in `clean_hash()` [#&#8203;7813](FreshRSS/FreshRSS#7813), [FreshRSS/simplepie#48](FreshRSS/simplepie#48)
- Deployment
  - Docker default image (Debian 12 Bookworm) updated to PHP 8.2.29 [#&#8203;7805](FreshRSS/FreshRSS#7805)
  - Docker alternative image updated to Alpine 3.22 with PHP 8.4.11 and Apache 2.4.65 [#&#8203;7740](FreshRSS/FreshRSS#7740), [#&#8203;7740](FreshRSS/FreshRSS#7740),
    [#&#8203;7803](FreshRSS/FreshRSS#7803)
  - Start supporting PHP 8.5+ [#&#8203;7787](FreshRSS/FreshRSS#7787), [#&#8203;7826](FreshRSS/FreshRSS#7826)
    - Docker Alpine dev image `:newest` updated to PHP 8.5-alpha and Apache 2.4.65 [#&#8203;7773](FreshRSS/FreshRSS#7773)
  - Docker: interpolate `FRESHRSS_INSTALL` and `FRESHRSS_USER` variables [#&#8203;7725](FreshRSS/FreshRSS#7725)
  - Docker: Reduce how much data needs to be chown/chmod’ed on container startup [#&#8203;7793](FreshRSS/FreshRSS#7793)
  - Test for database PDO typing support during install (relevant for MySQL / MariaDB with obsolete driver) [#&#8203;7651](FreshRSS/FreshRSS#7651)
- Extensions
  - Add API endpoint for extensions [#&#8203;7576](FreshRSS/FreshRSS#7576)
  - Expose the reading modes for extensions [#&#8203;7668](FreshRSS/FreshRSS#7668), [#&#8203;7688](FreshRSS/FreshRSS#7688)
  - New extension hook `before_login_btn` [#&#8203;7761](FreshRSS/FreshRSS#7761)
- UI
  - Improve *mark as read* request showing popup due to `onbeforeunload` [#&#8203;7554](FreshRSS/FreshRSS#7554)
  - Fix lazy-loading for `<video poster="...">` and `<image>` [#&#8203;7636](FreshRSS/FreshRSS#7636)
  - Avoid styling `<code>` inside of `<pre>` [#&#8203;7797](FreshRSS/FreshRSS#7797)
  - Improve confirmation logic with `data-auto-leave-validation` [#&#8203;7785](FreshRSS/FreshRSS#7785)
  - Update `chart.js` to 4.5.0 [#&#8203;7752](FreshRSS/FreshRSS#7752), [#&#8203;7816](FreshRSS/FreshRSS#7816)
  - Various UI and style improvements: [#&#8203;7616](FreshRSS/FreshRSS#7616), [#&#8203;7811](FreshRSS/FreshRSS#7811)
- I18n
  - Show translation status in README [#&#8203;7715](FreshRSS/FreshRSS#7715)
  - Improve Indonesian [#&#8203;7654](FreshRSS/FreshRSS#7654), [#&#8203;7721](FreshRSS/FreshRSS#7721)
  - Improve Persian [#&#8203;7795](FreshRSS/FreshRSS#7795)
- Misc.
  - Improve PHP code [#&#8203;7642](FreshRSS/FreshRSS#7642), [#&#8203;7665](FreshRSS/FreshRSS#7665), [#&#8203;7761](FreshRSS/FreshRSS#7761),
    [#&#8203;7781](FreshRSS/FreshRSS#7781), [#&#8203;7794](FreshRSS/FreshRSS#7794)
  - Update dev dependencies [#&#8203;7708](FreshRSS/FreshRSS#7708), [#&#8203;7709](FreshRSS/FreshRSS#7709), [#&#8203;7710](FreshRSS/FreshRSS#7710),
    [#&#8203;7711](FreshRSS/FreshRSS#7711), [#&#8203;7776](FreshRSS/FreshRSS#7776), [#&#8203;7777](FreshRSS/FreshRSS#7777)

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0MS4zNS4wIiwidXBkYXRlZEluVmVyIjoiNDEuMzUuMCIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiaW1hZ2UiXX0=-->

Reviewed-on: https://gitea.alexlebens.dev/alexlebens/infrastructure/pulls/1253
Co-authored-by: Renovate Bot <renovate-bot@alexlebens.net>
Co-committed-by: Renovate Bot <renovate-bot@alexlebens.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] FreshRSS body scraper does not properly handle relative links

3 participants