JSONFeeds, JSON scraping, and POST requests for feeds#5662
JSONFeeds, JSON scraping, and POST requests for feeds#5662Alkarex merged 32 commits intoFreshRSS:edgefrom
Conversation
| // TODO: Implement HTTP 410 Gone | ||
| } elseif (!is_string($body) || strlen($body) === 0) { | ||
| $body = ''; | ||
| } else { |
There was a problem hiding this comment.
enforceHttpEncoding is not written to deal with JSON, but only with HTML/XML. No need to call it for JSON.
and revert unrelated changes, plus a few manual fixes, but there are still several type errors
|
Thanks @eta-orionis , this looks promising 👍🏻 Something else: if it is not too much work, could you try to see whether you could avoid the dependency to SimpleXML? We use it in only one little function at the moment in our codebase, which I would like to rewrite, to reduce our requirements and footprint. A little tip: you can run locally |
|
Might replace: |
|
Thanks for the quick review @Alkarex !
I'd be happy to, especially now that the PR has a chance and that, thanks to your tip, I understood how the test system works :)
I'd like to refactor and simplify my two JSON processing functions into one; this will remove the SimpleXML dependency as well. |
|
|
Example to test, which can be imported by OPML: <?xml version="1.0" encoding="UTF-8"?>
<opml xmlns:frss="https://freshrss.org/opml" version="2.0">
<head>
<title>FreshRSS</title>
<dateCreated>Wed, 03 Jan 2024 14:49:54 +0100</dateCreated>
</head>
<body>
<outline
text="DTU"
type="JSON+DotPath"
xmlUrl="https://www.dtu.dk/api/v1/news/newslist"
htmlUrl="https://www.dtu.dk/nyheder/alle-nyheder"
description="Alle nyheder"
frss:jsonItem="Results"
frss:jsonItemTitle="Title"
frss:jsonItemContent="Summary"
frss:jsonItemUri="Url"
frss:jsonItemTimestamp="Date"
frss:jsonItemThumbnail="Image.ImageUrl"
frss:jsonItemCategories="Badge.Title"
frss:jsonItemUid="Url"
frss:CURLOPT_POST="1"
frss:CURLOPT_POSTFIELDS="{"Pagination":{"Number":1,"Size":"Ten"},"ListItemId":"240a27b9-4dd2-4f7f-9c58-773d643d11b7"}"
frss:CURLOPT_USERAGENT="curl"
frss:CURLOPT_HTTPHEADER="Content-Type: application/json"/>
</body>
</opml> |
|
@eta-orionis Thanks again, and please add a line for you in https://github.com/FreshRSS/FreshRSS/blob/edge/CREDITS.md |
|
I would like to have 2 or 3 example feeds. Could you please support here? |
|
Fixed PHP 7.4 compatibility in #6038 |
| $view->entries[] = FreshRSS_Entry::fromArray($rssItem); | ||
| } | ||
| } | ||
| } catch (Exception $ex) { |
There was a problem hiding this comment.
It does not look there was anything to catch there.
Addressed in #6037
| 'xpath' => 'XPath for:', | ||
| ), | ||
| 'json_dotpath' => array( | ||
| '_' => 'JSON (Dotted paths)', |
There was a problem hiding this comment.
Dot notation is the term I'm familiar with, yes.
There was a problem hiding this comment.
Back when we had the PR, I tried to search for standard / usual names. I could not find much, so references welcome
There was a problem hiding this comment.
There was a problem hiding this comment.
Dot notation is what it's called in JS and Java (and probably other languages as well), as distinguished from this['is']['equivalent']['bracket']['notation'].
It's not related to JSON per se of course, just to how you access object values.
|
Some encoding issues (e.g. with special characters such as |
Closes #1551
Changes proposed in this pull request:
The code is adapted from the existing HTML+Xpath scraping, with a few utilities added to support JSON dotted paths instead of XPath. It defines two new kinds of feeds and branches accordingly, same way as HTML+XPath.
(Most of the added lines are from adding new keys in all 24 i18n language files)
How to test the feature manually:
Pull request checklist:
Additional information can be found in the documentation.