Generate HTML episode description only when needed. #1246

auouymous · 2022-03-12T08:40:53Z

WARNING: This patch changes the database schema and version. A backup will be created in your gPodder directory the first time you use this patch on an existing database.

PR #1094 generated an HTML description for any episode that lacked one. That however increased the database size (almost double in worst case) because it was storing both text and html versions of each description. This fixes that by storing the episode thumbnail URL in the database and generating the HTML description only when shownotes are drawn.

The text description is now cleared for episodes with an HTML description. This further reduces database size for feeds that provide both. It also fixes an issue for feeds that provide different text and HTML descriptions, because the short description would show the text description and shownotes would show the HTML description. And EQL only searched the text descriptions, which might not match what the user sees in the shownotes.

elelay · 2022-03-16T20:50:00Z

I like the episode art url. gpodder-core has also added it, downloading the image when downloading the episode, deleting it when deleting the episode. They use it for the embedded player, though: useless to us. Nevermind: not the subject of this PR.
But, please change the column to episode_art_url for consistency.

I used to be against overriding the description when the author has taken the time to make it different than the html description. But I can't find examples anymore so I'm willing to let it go.

Is performance OK when searching now that the text_description is generated for each keypress (and also for displaying when display description is enabled) ? If OK, no objection.
If not, is it possible to have text_description memoized in a transient field?

As there is a schema change, shall we also add the chapters string field so we can add podlove chapters support in gpodder-core since 2014 later? This could go in a different PR but should be in the same release.

Last missing field from gpodder-core episode model would be the subtitle, but I don't really see the point of it. What do you think.

elelay

Thanks for the PR.
Please see previous comment. At least change the column name in SQLite.
Code seems fine but I haven't run the branch yet.

auouymous · 2022-03-17T02:26:06Z

Field renamed.

downloading the image when downloading the episode, deleting it when deleting the episode

I've been using a patch since August that does exactly that, and it displays the image in text and html shownotes, as well as in the episode tooltips. Just need to find some time to fix a couple minor issues and add settings to disable it.

overriding the description when the author has taken the time to make it different than the html description.

http://feeds.feedburner.com/se-radio stores the same feed description in the text description for every episode, and each html description is unique per episode. There could be feeds that do the same but reverse the fields. The problem is that episode list and EQL only used the text description and shownotes used the html description unless it was empty. Shownotes, EQL and episode list should use the same description for consistency, so it doesn't make sense to keep both around if we are going to prioritize one over the other.

Is performance OK

I actually didn't benchmark EQL and episode list performance, oops! Shownotes increased from 1.5μs to 3.8μs for html and 137μs to convert text to html. I figured this 91x increase was acceptable since shownotes have ~3ms of additional overhead. Of course rapidly up/down arrowing through episode list would be slower with shownotes open.

The EQL and episode list went from 1.5μs to 78-105μs (text or html stripping) which is a 52-70x increase amplified by every episode in the database. For 1000 episodes, it would take 76-103ms longer (for me) to query the episode list per key press, and this could be much greater on a slower device. Refreshing the episode list would be almost the same speed for text descriptions because one_line_description() is now 74μs faster.

I had text_description() strip the text description because youtube-dl and soundcloud did, but model.py didn't. I now see that podcastparser strips it for model.py. Not stripping in text_description() and html_description() will reduce text->html in shownotes from 137μs to 65μs, and EQL and episode list from 78-105μs to 3.8-105μs (still 2.5-70x). This would make episode list refreshing 74μs faster per text description episode than it is without this PR (after I remove stripping).

So I think stripping the html description and memoizing it in the text description is needed.

shall we also add the chapters string field

I don't know what the chapters or subtitle fields would do, but I'd say add them to the database in this PR if they can be fully implemented before release.

elelay · 2022-03-18T16:04:17Z

About chapters: having them in the database from podcastparser is a first step and quite easy: json.dumps them on feed update. Then:

we could make the MPRIS2-compatible player jump to a particular chapter via a menu.
we could display them with shownotes so users can manually jump to the time if they want.

auouymous · 2022-03-21T01:23:17Z

The text description is now cached at startup and during feed updates. EQL and episode list refresh is the same speed as it was before, although episode list might be bit faster due to it no longer stripping the description. The shownotes will take about 2% longer to update.

What database fields are needed to store chapters?

elelay · 2022-03-21T18:55:10Z

See branch auouymous-ondemand-html-description for 2 commits for chapters+subtitles in addition to your PR.
Not sure yet about subtitle: maybe add the field but don't put anything in it?

elelay · 2022-03-21T18:57:03Z

See https://feeds.metaebene.me/cre/m4a for a feed with chapters

elelay

LGTM

auouymous · 2022-03-22T04:20:52Z

I pushed your changes, and the subtitle and chapters are working for me on https://feeds.metaebene.me/cre/m4a episodes (with my debug extension).

As for subtitle, info['subtitle'] = episode._text_description in the tagging extension could use it as info['subtitle'] = episode.subtitle or episode._text_description. We could add a smaller line below title in shownotes for episodes with a subtitle, and query it with EQL. Or it could be added before the description in shownotes just in case a feed uses long subtitles. But we shouldn't add it to the database unless we actually plan to use it.

elelay · 2022-03-23T21:18:20Z

Let's have a look at subtitles in our subscriptions:

https://feeds.fireside.fm/bsdnow/rss has a clean one-line subtitle based on description but hand crafted
http://feeds.themoth.org/themothpodcast as a one-line subtitle

more examples:

sqlite> select podcast.url, e.cnt from (select podcast_id, count(*) as cnt from episode where subtitle is not null group by podcast_id) e join podcast on e.podcast_id = podcast.id;
http://feeds.themoth.org/themothpodcast|209
https://feeds.fireside.fm/bsdnow/rss|207
https://feeds.simplecast.com/BqbsxVfO|1
http://api.sr.se/api/rss/pod/4023|222
https://podcasts.files.bbci.co.uk/p02pc9qx.rss|222
https://www.thetype.com/typechat/feed/|170
http://podcasts.files.bbci.co.uk/p05t3gr2.rss|56
http://feeds.feedburner.com/podnutzpro?format=xml|30
https://radicalpersonalfinance.libsyn.com/rss|165
https://www.thenakedscientists.com/naked_scientists_podcast.xml|222
https://essay-faz.podigee.io/feed/mp3|113
http://cre.fm/feed/m4a|222
https://www.functionalgeekery.com/feed/mp3/|75
https://www.pipes.digital/feed/1N5Aa3Ng|221
https://www.thejazzsession.com/feed/?category_name=podcast|221
https://libwww.freelibrary.org/podcast/rss.cfm|222
https://feeds.podcastmirror.com/makeshiftstories|222
https://onscript.study/feed/podcast|215
http://www.rtve.es/api/programas/2058/audios.rss|20
https://feeds.megaphone.fm/offtopic|114
http://freakshow.fm/feed/m4a|152

auouymous · 2022-03-24T00:37:47Z

http://feeds.themoth.org/themothpodcast The Moth features people telling true, engaging, funny, touching and eye-opening stories from their lives. is one line but quite long and would require at least two lines in a narrow shownotes.

Terravision ist der erste virtuelle Globus, das Original hinter Google Earth und die Vorlage für den Film &quot;The Billion Dollar Code&quot; from http://cre.fm/feed/m4a is not only long but also contains HTML entities. I don't know why, but the text shownotes description properly converts HTML entities and the title displays them raw. I'd guess the entities are allowed since subtitle is used as a short description in itunes.

http://feeds.feedburner.com/se-radio, which has an HTML episode description and episode text description contains the channel description, puts the channel name in subtitle. :)

title:  Episode 504: Frank McSherry on Materialize
subtitle:  SE-Radio

auouymous · 2022-04-10T05:37:48Z

Rebased and applied SQL fix.

elelay · 2022-06-18T16:08:30Z

ok to drop subtitle then. Can you make it in this branch?

auouymous · 2022-06-18T16:44:12Z

Can you make it in this branch?

@elelay What do you mean by that?

elelay · 2022-06-18T20:45:25Z

ah, sorry it wasn't clear.
I meant can you do the work and add it to this pull request.
Thanks,

Co-authored-by: Eric Le Lay <elelay@macports.org>

PR gpodder#1094 generated an HTML description for any episode that lacked one. That however increased the database size (almost double in worst case) because it was storing both text and html versions of each description. This fixes that by storing the episode thumbnail URL in the database and generating the HTML description only when shownotes are drawn. The text description is now cleared for episodes with an HTML description. This further reduces database size for feeds that provide both. It also fixes an issue for feeds that provide different text and HTML descriptions, because the short description would show the text description and shownotes would show the HTML description. And EQL only searched the text descriptions, which might not match what the user sees in the shownotes.

auouymous · 2022-06-20T07:25:00Z

Subtitle removed.

elelay · 2022-06-20T11:25:08Z

Merged, thanks!

auouymous requested a review from elelay March 12, 2022 08:40

elelay requested changes Mar 16, 2022

View reviewed changes

auouymous force-pushed the ondemand-html-description branch from e10bbcc to 41a369a Compare March 21, 2022 01:11

elelay approved these changes Mar 21, 2022

View reviewed changes

auouymous force-pushed the ondemand-html-description branch from 41a369a to 564200e Compare March 22, 2022 03:52

auouymous force-pushed the ondemand-html-description branch from 8fe0bb8 to fec6092 Compare April 10, 2022 05:36

auouymous force-pushed the ondemand-html-description branch from fec6092 to 6c2bd0b Compare June 3, 2022 10:57

auouymous and others added 3 commits June 19, 2022 11:56

Add episode art URL and chapters to database.

97d9459

Co-authored-by: Eric Le Lay <elelay@macports.org>

get chapters from feedparser

0acaee4

auouymous force-pushed the ondemand-html-description branch from 6c2bd0b to 0acaee4 Compare June 20, 2022 06:32

elelay merged commit b1bd429 into gpodder:master Jun 20, 2022

auouymous deleted the ondemand-html-description branch June 20, 2022 19:22

Uh oh!

Generate HTML episode description only when needed. #1246

Generate HTML episode description only when needed. #1246

Uh oh!

Conversation

auouymous commented Mar 12, 2022

Uh oh!

elelay commented Mar 16, 2022

Uh oh!

elelay left a comment

Choose a reason for hiding this comment

Uh oh!

auouymous commented Mar 17, 2022

Uh oh!

elelay commented Mar 18, 2022

Uh oh!

auouymous commented Mar 21, 2022

Uh oh!

elelay commented Mar 21, 2022

Uh oh!

elelay commented Mar 21, 2022

Uh oh!

elelay left a comment

Choose a reason for hiding this comment

Uh oh!

auouymous commented Mar 22, 2022

Uh oh!

elelay commented Mar 23, 2022

Uh oh!

auouymous commented Mar 24, 2022

Uh oh!

auouymous commented Apr 10, 2022

Uh oh!

elelay commented Jun 18, 2022

Uh oh!

auouymous commented Jun 18, 2022

Uh oh!

elelay commented Jun 18, 2022

Uh oh!

auouymous commented Jun 20, 2022

Uh oh!

elelay commented Jun 20, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants