Simple News (RSS/Atom) downloader plugin#2592
Conversation
| end | ||
|
|
||
|
|
||
| function NewsDownloader:createValidFilename(name) |
There was a problem hiding this comment.
We actually have a utility function for this. See https://github.com/koreader/koreader/pull/2464/files
|
Great initiative! I dropped one quick comment based on a skim. The XML parser would probably also be useful for the OPDS browser. Btw, take a look at the travis log at the end: https://s3.amazonaws.com/archive.travis-ci.org/jobs/205933661/log.txt There are a great many warnings about trailing whitespaces and such. |
|
Nice! Since this is a very big patch, it will take us awhile to fully review it. I try try my best to read through it as fast as possible ;) |
|
@Frenzie - I've replaced my function for valid names with util.replaceInvalidChars & removed tail spaces. @houqp - my change isn't to big - just 138 lines. As currently I don't have time (and knowledge - majority of code was "copy-paste" from internet lua examples) I'm afraid I won't be able to fix anything more. |
|
One thing I'd like to add to this code is to check for device (Android,Kindle etc -currently at least on android it's not writeable) and find correct writeable dir for each device.. I bet there's already similar code in Koreader. Could you help me find such example? ps: I hope you don't want me to correct used library (LuaXml) to be aligned with koreader travis? |
It wouldn't be an unreasonable expectation actually. You just have to turn it around. ;-) See https://github.com/koreader/koreader/blob/fc5e8f1242eb33912f5d1c0ab7d43528a69a7334/.luacheckrc |
|
Oh, but I'd wait for @houqp to weigh in. I just wanted to point out that there are two ways to fix the Travis result, only one of which is difficult (but typically the right thing to do). |
|
I have not had a look at the code. But data storage can always give you a writable folder. |
|
@mwoz123, no worry, we can take it from here if you don't have time to update the patch. I am mostly reading the code to figure out whether those external libraries are really required. We usually disable the static code analysis for external libraries anyway. |
|
This will be a final version once I fix travis warning - please review. Can someone help me get rid of this warning: |
|
Same for xmlParser which only lives in xml.lua. |
| @@ -0,0 +1,8 @@ | |||
| <?xml version="1.0" encoding="UTF-8"?> | |||
There was a problem hiding this comment.
Why not use lua for this configuration? The you can merge it with newsConfig.lua into one config.
There was a problem hiding this comment.
Yes, I had thought about that. But since:
- Lua files are specyfic for lua and XML files are more general/standard and used all over Internet and PC
- you don't have to be famirial with LUA to edit XML files
- so more people are able to edit XML over LUA files
- you don't have to be a lua developer to edit XML files
- newsConfig contains other kind of configuration (folder and filenames used by program).
- on Android we wouldn't be able to edit newsConfig.lua (this is one of the most important for me as I'm running android) so we wouldn't be able to change default feeds (in current implementation), [on Android feed xml config is copied to writeable location)
- I don't know howto create a clean and meaningful list in LUA config file & don't have time to learn more LUA
I'd like to leave it as it is. Is that ok?
It would make sense to change it to configuration is we had GUI for editing feed urls, but I currently don't plan to do so, maybe in some time in future if I had time.
So please for first edition approve it this way, if I had time in future I'll try to change it to UI edition. But it might be in couple months, year or more. Let's not wait so long for this pull request. Ok?
There was a problem hiding this comment.
Ad. 3. XML is open "protocol" commonly used in many technologies, many people (even not developers) are familiar with it.
And lua files are well known only to lua programmers.
So much more people are ale to edit XML than lua files.
Ad. 5. Feeds.xml contains data that MUST/should be modified by user.
NewsConfig.lua contains data that should NOT be modified ( at least not by unexperienced users).
There was a problem hiding this comment.
Lua files are specyfic for lua and XML files are more general/standard and used all over Internet and PC
you don't have to be famirial with LUA to edit XML files
so more people are able to edit XML over LUA files
you don't have to be a lua developer to edit XML files
I agree with you XML has a much larger audience compared to Lua syntax.
However, please keep in mind that many of koreader's users do not have technical background, so they might not even know what XML is. In fact, XML is not very friendly for a human to read.
Compare:
<?xml version="1.0" encoding="UTF-8"?>
<feeds>
<!-- List your feeds here: -->
<feed limit="2">http://www.pcworld.com/index.rss</feed>
<feed limit="3">http://rss.cnn.com/rss/edition_world.rss</feed>
<!-- set limit to "0" means no download, "-1" no limit. -->
<feed limit="0">http://feeds.reuters.com/reuters/UKSportsNews?format=xml</feed>
</feeds>With:
return {
-- list your feeds here:
{ "http://www.pcworld.com/index.rss", limit = 2 },
{ "http://rss.cnn.com/rss/edition_world.rss", limit = 3 },
-- set limit to "0" means no download, "-1" no limit.
{ "http://feeds.reuters.com/reuters/UKSportsNews?format=xml", limit = 0 },
}Which one do you think is more readable by human, especially to those non-technical users?
And for the current users, they already have to know a little bit of Lua to edit the existing setting files, add a new config in XML will make it worse because now users have to learn both Lua and XML. In fact, I have plan to migrate all configurations from Lua to a proper config DSL like yaml or toml:
# list your feeds here:
- url: "http://www.pcworld.com/index.rss"
limit: 1
- url: "http://rss.cnn.com/rss/edition_world.rss"
limit: 2
# set limit to "0" means no download, "-1" no limit.
- url: "http://feeds.reuters.com/reuters/UKSportsNews?format=xml"
limit: 0XML is not just hard to read by human, but also can't be parsed efficiently by machines. There are many serialization formats that are much more efficient than XML. The only benefit of XML I can think of is flexibility, but a properly designed data structure should be able to be represented in a simple DSL like json.
on Android we wouldn't be able to edit newsConfig.lua (this is one of the most important for me as I'm running android) so we wouldn't be able to change default feeds (in current implementation), [on Android feed xml config is copied to writeable location)
I have merged newsConfig.lua into main.lua in my PR. See explanations in my commit message: mwoz123@8b4e9b9
You will be able to edit it too if the lua config is copied to the same writeable location.
I don't know howto create a clean and meaningful list in LUA config file & don't have time to learn more LUA
I understand this, that's why I am offering the help to finish this part for you ;)
I am happy to discuss more with you on this if you still think XML config is a better choice.
There was a problem hiding this comment.
a properly designed data structure should be able to be represented in a simple DSL like json.
JSON is a pain imo. When manually editing config files I like to:
return {
-- list your feeds here:
{ "http://www.pcworld.com/index.rss", limit = 2 },
{ "http://rss.cnn.com/rss/edition_world.rss", limit = 3 },
-- set limit to "0" means no download, "-1" no limit.
-- LOOK MA I QUICKLY DISABLED THE NEXT LINE
--{ "http://feeds.reuters.com/reuters/UKSportsNews?format=xml", limit = 0 },
}But JSON doesn't allow that. XML (and Lua, yaml, and traditional INI files, etc.) does.
Anyway, I'm also against XML. I can read and write it fluently, but it's rather verbose for the kind of thing we're talking about.
There was a problem hiding this comment.
I don't mind to have the feeds configuration in different format as long it's clear to read and we have examples there.
I just used XML as I was already familiar with parsing that in lua.
I was also against mixing feeds configuration with application variables.
There was a problem hiding this comment.
JSON is a pain imo. When manually editing config files I like to:
Yes, that's why I didn't mention it when I was talking about which DSL to pick for KOReader :) I am using json mostly as an example for how far simple data structure can get you.
JSON is also not designed for better readability and the encode/decoding performance is really bad. I would not pick JOSN for any of my project unless I need to integrate with a web UI or any legacy REST system.
I also wrote a lot of XML 6 years ago when I was writing chapters using docbook. Oh god, it was painful.
| function NewsDownloader:removeAllExceptFeedConfig(dir, rmdir) | ||
| local ffi = require("ffi"); | ||
|
|
||
| require("ffi/zeromq_h") |
There was a problem hiding this comment.
I don't see you using zeromq here?
There was a problem hiding this comment.
Ok. I'll remove that.
As this is external library (and so I woudn't like to make changes in external libs), can we ignore that warning? Ofcourse I'll fix warnings in other methods. |
|
Fixed warnings. All checks pass. Can you merge it now? |
|
Guys, any comments? When you 'll find time to review it? |
| text = _("Help"), | ||
| callback = function() | ||
| UIManager:show(InfoMessage:new{ | ||
| text = _("Script uses config from " .. self:getNewsDirPath() .. config.FEED_FILE_NAME .. " to download feeds to " .. self:getNewsDirPath() .. " directory."), |
There was a problem hiding this comment.
The translation won't function this way. It needs to be something a little more like this:
text = _("Script uses config from %1%2 to download feeds to %3 directory.", self:getNewsDirPath(), config.FEED_FILE_NAME, self:getNewsDirPath()),| end | ||
|
|
||
| UIManager:show(InfoMessage:new{ | ||
| text = _("Downloading News Finished.") |
There was a problem hiding this comment.
Should be lowercase.
Downloading news finished.
| for index, feed in pairs(feedSources.feeds.feed) do | ||
| local url = feed[1]; | ||
| UIManager:show(InfoMessage:new{ | ||
| text = _("Processing: ") .. url, |
| local feedXmlFilePath = self:getFeedXmlPath(); | ||
|
|
||
| if not util.file_exists(feedXmlFilePath) then | ||
| DEBUG("Creating init configuration"); |
There was a problem hiding this comment.
I'd prefix the debug message with where it's coming from.
|
Thanks! |
|
Changes after review done. Lua check warnings fixed. |
|
Sorry, busy dealing with personal life events right now. I will send a PR to your branch to address some of the issues as soon as possible. |
|
@houqp Ok. Thanks. Np. Looking forward for that PR;) |
|
OK, I will merge it after we do the final change to migrate the config over. I think this is important because once the code is merged in, people will start using it in the nightly builds. And when we change it to another format afterwards, it will firstly break their existing configs and secondly leave out the old config to not be cleaned up. We can add code to automatically migrate the config, but it will be bloating the code base for a temporary change. If you don't have time to work on this part, I will help get it implemented. This will be one of the highest priority item on my todo list FWIW. |
|
Any examples, where such configuration was used, so maybe I could also try to help? |
|
https://github.com/koreader/koreader/blob/master/plugins/perceptionexpander.koplugin/main.lua is a good example for both managing settings using I am happy to merge without the UI part, I think that can be added later. I just want to make sure users are not hit by the config migration issue. |
optimization & fix UI repaint for loading info messages
|
so what should be the new feed configuration format, because I'm now confused ? Lua? |
|
Yeah, @houqp proposed something alone these lines above: return {
-- list your feeds here:
{ "http://www.pcworld.com/index.rss", limit = 2 },
{ "http://rss.cnn.com/rss/edition_world.rss", limit = 3 },
-- set limit to "0" means no download, "-1" no limit.
{ "http://feeds.reuters.com/reuters/UKSportsNews?format=xml", limit = 0 },
} |
|
So Lua will be the final configuration type?Is it the last that change before merge? Anyway as we already have "initial configuration" functionality it will mean that any configuration (e.g. structure) changes it won't break plugin but rather cause download the initial example feeds, and so user will have to (again) manually update the new config file (which is ofcourse a drawback). And as Koreader is not provided out of the box with any readers, its users have to be at least some technical. |
Lua will be the config DSL for now. This is the last blocker from me as I mentioned in my previous review comment.
We do have the plan to migrate to a proper config DSL, but that's a large undertaking project because we need to make sure all Lua configs are handled properly and seamlessly without interruption to the users.
Like I said in my previous comment, there are two problems: 1. users have to manually update the new config. 2. old config will be left there untouched, users will be confused because they don't know if it's ok to delete it or not. BTW, it wasn't a working fine version until you merged my two patches last night. The UI update was not working due to blocking function calls.
I am not saying users won't be able to edit lua config, its already proven that most of the users have no problem understanding and editing it. It's that extra work that we are imposing to the users because we developers don't want to spend extra time and effort polishing the code. I consider myself a technical user, and I am certainly not happy every time I had to manually fix the stuff I use due to developer laziness. |
|
Again, like I mentioned many times before, if you can't wait to use this on your device, you can always use the extra plugin look up path feature to load external plugins that's not shipped by koreader. I literally wrote that feature for you so you can load extra plugins in android. |
chore: migrate config to lua & avoid xml parsing crash
Could you fix that? |
|
We can safely ignore that error for this PR, but rebasing on top of the latest master should also fix it (see #2767). |
| -- only supports http URL for now | ||
| -- Atom is currently not supported, only RSS | ||
| -- set limit to "0" means no download, "-1" no limit. | ||
| { "http://www.huffingtonpost.com/feeds/index.xml", limit = 2 }, |
There was a problem hiding this comment.
Wikipedia says this portal is "overtly liberal/left commentary".
I don't want this plugin to be connected with politician option.
Let's use something that's neutral view if not then let's replace that with sport source.
There was a problem hiding this comment.
Agreed, although apparently easier said than done. Browsing around a little, it would seem that most sources only give you the headline in the feed.
There was a problem hiding this comment.
As we not rely on "description" but rather url link to html document it shouldn't be so hard.
Just need to find RSS format (as atom is yet not supported/fully tested) neutral news source if that would be too hard let's take a sport source.
There was a problem hiding this comment.
Quick reserch and Associated Press look's best for me:
AP - http://hosted.ap.org/lineups/WORLDHEADS.rss?SITE=AP&SECTION=HOME
Other proposal:
Reuters http://feeds.reuters.com/Reuters/worldNews?format=xml- use atomCNN http://rss.cnn.com/rss/edition_world.rssuse atom, not sure if neutralNYT http://rss.nytimes.com/services/xml/rss/nyt/World.xmlas above- AP sport news http://hosted2.ap.org/atom/APDEFAULT/347875155d53465d95cec892aeb06419
- economist international news http://www.economist.com/sections/international/rss.xml - TODO check if neutral,
http://www.news.com.au/sport/rss
There was a problem hiding this comment.
BBC got RSS-like structure but atom in "header" not sure if that's will be handled correctly.
What you think about AP?
BTW @Frenzie I merged from master but it didn't solve the travis issue.
There was a problem hiding this comment.
AP relays on injection XMLs into pages, so it wasn't correct on our simple "wget" we do for download.
I changed that to economist science and culture(disabled by default) as they should be neutral.
Fell free to update it to something better that's international, RSS & neutral.
48c8c86 to
d09fd67
Compare
d09fd67 to
e14e4d0
Compare
|
@Frenzie merge from master didn't solve the travis error, but as you said it can be ignored I think we're finally ready to merge to master? |
|
Merging now since @houqp said the configuration was the last blocker. |
|
Pinging this thread to see if folks want to contribute ideas towards Newsdownloader's rebuild. See this thread for more info. |
Simple News (RSS/Atom) downloader plugin.
Downloads feeds (and their entries) listed in feeds.xml and stores as .html in News directory.
As this is my first Lua code ever please be lenient.
Feel free to change/refactor/improve it etc;)