Skip to content

JSONFeeds, JSON scraping, and POST requests for feeds#5662

Merged
Alkarex merged 32 commits intoFreshRSS:edgefrom
eta-orionis:edge
Jan 10, 2024
Merged

JSONFeeds, JSON scraping, and POST requests for feeds#5662
Alkarex merged 32 commits intoFreshRSS:edgefrom
eta-orionis:edge

Conversation

@eta-orionis
Copy link
Contributor

@eta-orionis eta-orionis commented Sep 19, 2023

Closes #1551

Changes proposed in this pull request:

  • Allow feeds to be requested via HTTP POST method (useful e.g. for web scraping the results of a search, form, or API request).
  • Subscribe to JSONFeeds
  • Subscribe to feeds generated by scraping JSON responses, similarly to HTML+XPath scraping. For JSON, however, a dotted.path.notation is provided, which is much easier to understand and follow than XPath

The code is adapted from the existing HTML+Xpath scraping, with a few utilities added to support JSON dotted paths instead of XPath. It defines two new kinds of feeds and branches accordingly, same way as HTML+XPath.

(Most of the added lines are from adding new keys in all 24 i18n language files)

How to test the feature manually:

  1. Subscribe to a JSONFeed, e.g. https://www.jsonfeed.org/feed.json
  2. Subscribe to any API that returns JSON. (Potentially pass request parameters via POST)

Pull request checklist:

  • clear commit messages
  • code manually tested
  • unit tests written (optional if too hard)
  • documentation updated

Additional information can be found in the documentation.

@eta-orionis eta-orionis changed the title allow POST requests for feeds JSONFeeds, JSON scraping, and POST requests for feeds Sep 19, 2023
@eta-orionis eta-orionis marked this pull request as ready for review September 19, 2023 15:59
// TODO: Implement HTTP 410 Gone
} elseif (!is_string($body) || strlen($body) === 0) {
$body = '';
} else {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enforceHttpEncoding is not written to deal with JSON, but only with HTML/XML. No need to call it for JSON.

@Alkarex Alkarex modified the milestones: 1.22.0, 1.23.0 Sep 19, 2023
and revert unrelated changes, plus a few manual fixes, but there are still several type errors
@Alkarex
Copy link
Member

Alkarex commented Sep 19, 2023

Thanks @eta-orionis , this looks promising 👍🏻
I would like to release FreshRSS 1.22 (soon) before merging some new features like this one.
In the meantime, you are welcome to try to address the various type errors caught by our automated PHPStan.

Something else: if it is not too much work, could you try to see whether you could avoid the dependency to SimpleXML? We use it in only one little function at the moment in our codebase, which I would like to rewrite, to reduce our requirements and footprint.

A little tip: you can run locally make fix-all and make test-all to auto-fix many low-importance things, and run the automated tests. If for some reason that does not work on your local environment, you can try our dev container such as https://github.com/codespaces/new?hide_repo_select=true&ref=edge&repo=693185154

@Alkarex
Copy link
Member

Alkarex commented Sep 19, 2023

Might replace:

@eta-orionis
Copy link
Contributor Author

eta-orionis commented Sep 19, 2023

Thanks for the quick review @Alkarex !

In the meantime, you are welcome to try to address the various type errors caught by our automated PHPStan.

I'd be happy to, especially now that the PR has a chance and that, thanks to your tip, I understood how the test system works :)

Something else: if it is not too much work, could you try to see whether you could avoid the dependency to SimpleXML?

I'd like to refactor and simplify my two JSON processing functions into one; this will remove the SimpleXML dependency as well.

@Alkarex
Copy link
Member

Alkarex commented Jan 3, 2024

  • import/export of the attributes such as jsonItem through OPML is lacking

@Alkarex
Copy link
Member

Alkarex commented Jan 3, 2024

Example to test, which can be imported by OPML:

<?xml version="1.0" encoding="UTF-8"?>
<opml xmlns:frss="https://freshrss.org/opml" version="2.0">
	<head>
		<title>FreshRSS</title>
		<dateCreated>Wed, 03 Jan 2024 14:49:54 +0100</dateCreated>
	</head>
	<body>
		<outline
			text="DTU"
			type="JSON+DotPath"
			xmlUrl="https://www.dtu.dk/api/v1/news/newslist"
			htmlUrl="https://www.dtu.dk/nyheder/alle-nyheder"
			description="Alle nyheder"
			frss:jsonItem="Results"
			frss:jsonItemTitle="Title"
			frss:jsonItemContent="Summary"
			frss:jsonItemUri="Url"
			frss:jsonItemTimestamp="Date"
			frss:jsonItemThumbnail="Image.ImageUrl"
			frss:jsonItemCategories="Badge.Title"
			frss:jsonItemUid="Url"
			frss:CURLOPT_POST="1"
			frss:CURLOPT_POSTFIELDS="{&quot;Pagination&quot;:{&quot;Number&quot;:1,&quot;Size&quot;:&quot;Ten&quot;},&quot;ListItemId&quot;:&quot;240a27b9-4dd2-4f7f-9c58-773d643d11b7&quot;}"
			frss:CURLOPT_USERAGENT="curl"
			frss:CURLOPT_HTTPHEADER="Content-Type: application/json"/>
	</body>
</opml>

@Alkarex Alkarex merged commit 9c97d8c into FreshRSS:edge Jan 10, 2024
This was referenced Jan 10, 2024
@Alkarex
Copy link
Member

Alkarex commented Jan 10, 2024

@eta-orionis Thanks again, and please add a line for you in https://github.com/FreshRSS/FreshRSS/blob/edge/CREDITS.md

@math-GH
Copy link
Contributor

math-GH commented Jan 10, 2024

I would like to have 2 or 3 example feeds. Could you please support here?

@gingerbeardman
Copy link
Contributor

Some examples in #1551 @math-GH

@Alkarex
Copy link
Member

Alkarex commented Jan 13, 2024

Fixed PHP 7.4 compatibility in #6038

$view->entries[] = FreshRSS_Entry::fromArray($rssItem);
}
}
} catch (Exception $ex) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not look there was anything to catch there.
Addressed in #6037

'xpath' => 'XPath for:',
),
'json_dotpath' => array(
'_' => 'JSON (Dotted paths)',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dotted paths => would be Dot-Notation the correct term here?

(ping @Alkarex @Frenzie )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dot notation is the term I'm familiar with, yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back when we had the PR, I tried to search for standard / usual names. I could not find much, so references welcome

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dot notation is what it's called in JS and Java (and probably other languages as well), as distinguished from this['is']['equivalent']['bracket']['notation'].

It's not related to JSON per se of course, just to how you access object values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i18n Strings improved: #6317

@Alkarex
Copy link
Member

Alkarex commented Sep 21, 2024

Some encoding issues (e.g. with special characters such as <'&"> ) were fixed in #6821
Tests welcome

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature request] JSON Feed

6 participants