Skip to content

Simple News (RSS/Atom) downloader plugin#2592

Merged
Frenzie merged 55 commits intokoreader:masterfrom
mwoz123:simple-rss-downloader-plugin
Apr 19, 2017
Merged

Simple News (RSS/Atom) downloader plugin#2592
Frenzie merged 55 commits intokoreader:masterfrom
mwoz123:simple-rss-downloader-plugin

Conversation

@mwoz123
Copy link
Copy Markdown
Contributor

@mwoz123 mwoz123 commented Feb 27, 2017

Simple News (RSS/Atom) downloader plugin.
Downloads feeds (and their entries) listed in feeds.xml and stores as .html in News directory.

As this is my first Lua code ever please be lenient.
Feel free to change/refactor/improve it etc;)

end


function NewsDownloader:createValidFilename(name)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually have a utility function for this. See https://github.com/koreader/koreader/pull/2464/files

@Frenzie
Copy link
Copy Markdown
Member

Frenzie commented Feb 27, 2017

Great initiative! I dropped one quick comment based on a skim. The XML parser would probably also be useful for the OPDS browser. Btw, take a look at the travis log at the end: https://s3.amazonaws.com/archive.travis-ci.org/jobs/205933661/log.txt

There are a great many warnings about trailing whitespaces and such.

@houqp
Copy link
Copy Markdown
Member

houqp commented Feb 28, 2017

Nice! Since this is a very big patch, it will take us awhile to fully review it. I try try my best to read through it as fast as possible ;)

@mwoz123
Copy link
Copy Markdown
Contributor Author

mwoz123 commented Mar 1, 2017

@Frenzie - I've replaced my function for valid names with util.replaceInvalidChars & removed tail spaces.

@houqp - my change isn't to big - just 138 lines.
Lib folder context's only - https://github.com/manoelcampos/LuaXML. So not sure if you really have to check that?

As currently I don't have time (and knowledge - majority of code was "copy-paste" from internet lua examples) I'm afraid I won't be able to fix anything more.

@mwoz123
Copy link
Copy Markdown
Contributor Author

mwoz123 commented Mar 3, 2017

One thing I'd like to add to this code is to check for device (Android,Kindle etc -currently at least on android it's not writeable) and find correct writeable dir for each device..

I bet there's already similar code in Koreader. Could you help me find such example?

ps: I hope you don't want me to correct used library (LuaXml) to be aligned with koreader travis?

@Frenzie
Copy link
Copy Markdown
Member

Frenzie commented Mar 3, 2017

ps: I hope you don't want me to correct used library (LuaXml) to be aligned with koreader travis?

It wouldn't be an unreasonable expectation actually. You just have to turn it around. ;-) See https://github.com/koreader/koreader/blob/fc5e8f1242eb33912f5d1c0ab7d43528a69a7334/.luacheckrc

@Frenzie
Copy link
Copy Markdown
Member

Frenzie commented Mar 3, 2017

Oh, but I'd wait for @houqp to weigh in. I just wanted to point out that there are two ways to fix the Travis result, only one of which is difficult (but typically the right thing to do).

@Hzj-jie
Copy link
Copy Markdown
Contributor

Hzj-jie commented Mar 5, 2017

I have not had a look at the code. But data storage can always give you a writable folder.
https://github.com/koreader/koreader/blob/master/datastorage.lua

@houqp
Copy link
Copy Markdown
Member

houqp commented Mar 5, 2017

@mwoz123, no worry, we can take it from here if you don't have time to update the patch. I am mostly reading the code to figure out whether those external libraries are really required. We usually disable the static code analysis for external libraries anyway.

@mwoz123
Copy link
Copy Markdown
Contributor Author

mwoz123 commented Mar 13, 2017

This will be a final version once I fix travis warning - please review.

Can someone help me get rid of this warning:
plugins/newsdownloader.koplugin/main.lua:131:22: accessing undefined variable 'simpleTreeHandler'
plugins/newsdownloader.koplugin/main.lua:134:21: accessing undefined variable 'xmlParser'
?
I've no idea how to fix it:(

@houqp
Copy link
Copy Markdown
Member

houqp commented Mar 14, 2017

simpleTreeHandler is an internal function within handler module, so it's not visible to main.lua. You need to export the function in handler.lua and require handler.lua in main.lua.

Same for xmlParser which only lives in xml.lua.

@@ -0,0 +1,8 @@
<?xml version="1.0" encoding="UTF-8"?>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use lua for this configuration? The you can merge it with newsConfig.lua into one config.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I had thought about that. But since:

  1. Lua files are specyfic for lua and XML files are more general/standard and used all over Internet and PC
  2. you don't have to be famirial with LUA to edit XML files
  3. so more people are able to edit XML over LUA files
  4. you don't have to be a lua developer to edit XML files
  5. newsConfig contains other kind of configuration (folder and filenames used by program).
  6. on Android we wouldn't be able to edit newsConfig.lua (this is one of the most important for me as I'm running android) so we wouldn't be able to change default feeds (in current implementation), [on Android feed xml config is copied to writeable location)
  7. I don't know howto create a clean and meaningful list in LUA config file & don't have time to learn more LUA

I'd like to leave it as it is. Is that ok?

It would make sense to change it to configuration is we had GUI for editing feed urls, but I currently don't plan to do so, maybe in some time in future if I had time.

So please for first edition approve it this way, if I had time in future I'll try to change it to UI edition. But it might be in couple months, year or more. Let's not wait so long for this pull request. Ok?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ad. 3. XML is open "protocol" commonly used in many technologies, many people (even not developers) are familiar with it.
And lua files are well known only to lua programmers.
So much more people are ale to edit XML than lua files.

Ad. 5. Feeds.xml contains data that MUST/should be modified by user.

NewsConfig.lua contains data that should NOT be modified ( at least not by unexperienced users).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lua files are specyfic for lua and XML files are more general/standard and used all over Internet and PC
you don't have to be famirial with LUA to edit XML files
so more people are able to edit XML over LUA files
you don't have to be a lua developer to edit XML files

I agree with you XML has a much larger audience compared to Lua syntax.

However, please keep in mind that many of koreader's users do not have technical background, so they might not even know what XML is. In fact, XML is not very friendly for a human to read.

Compare:

<?xml version="1.0" encoding="UTF-8"?>
<feeds>
	<!-- List your feeds here: -->
	<feed limit="2">http://www.pcworld.com/index.rss</feed>
	<feed limit="3">http://rss.cnn.com/rss/edition_world.rss</feed>
	<!-- set limit to "0" means no download, "-1" no limit. --> 
	<feed limit="0">http://feeds.reuters.com/reuters/UKSportsNews?format=xml</feed>
</feeds>

With:

return {
    -- list your feeds here:
    { "http://www.pcworld.com/index.rss", limit = 2 },
    { "http://rss.cnn.com/rss/edition_world.rss", limit = 3 },
    -- set limit to "0" means no download, "-1" no limit.
    { "http://feeds.reuters.com/reuters/UKSportsNews?format=xml", limit = 0 },
}

Which one do you think is more readable by human, especially to those non-technical users?

And for the current users, they already have to know a little bit of Lua to edit the existing setting files, add a new config in XML will make it worse because now users have to learn both Lua and XML. In fact, I have plan to migrate all configurations from Lua to a proper config DSL like yaml or toml:

# list your feeds here:
- url: "http://www.pcworld.com/index.rss"
  limit: 1
- url: "http://rss.cnn.com/rss/edition_world.rss"
  limit: 2
# set limit to "0" means no download, "-1" no limit.
- url: "http://feeds.reuters.com/reuters/UKSportsNews?format=xml"
  limit: 0

XML is not just hard to read by human, but also can't be parsed efficiently by machines. There are many serialization formats that are much more efficient than XML. The only benefit of XML I can think of is flexibility, but a properly designed data structure should be able to be represented in a simple DSL like json.

on Android we wouldn't be able to edit newsConfig.lua (this is one of the most important for me as I'm running android) so we wouldn't be able to change default feeds (in current implementation), [on Android feed xml config is copied to writeable location)

I have merged newsConfig.lua into main.lua in my PR. See explanations in my commit message: mwoz123@8b4e9b9

You will be able to edit it too if the lua config is copied to the same writeable location.

I don't know howto create a clean and meaningful list in LUA config file & don't have time to learn more LUA

I understand this, that's why I am offering the help to finish this part for you ;)

I am happy to discuss more with you on this if you still think XML config is a better choice.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a properly designed data structure should be able to be represented in a simple DSL like json.

JSON is a pain imo. When manually editing config files I like to:

return {
    -- list your feeds here:
    { "http://www.pcworld.com/index.rss", limit = 2 },
    { "http://rss.cnn.com/rss/edition_world.rss", limit = 3 },
    -- set limit to "0" means no download, "-1" no limit.
    -- LOOK MA I QUICKLY DISABLED THE NEXT LINE
    --{ "http://feeds.reuters.com/reuters/UKSportsNews?format=xml", limit = 0 },
}

But JSON doesn't allow that. XML (and Lua, yaml, and traditional INI files, etc.) does.

Anyway, I'm also against XML. I can read and write it fluently, but it's rather verbose for the kind of thing we're talking about.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind to have the feeds configuration in different format as long it's clear to read and we have examples there.

I just used XML as I was already familiar with parsing that in lua.

I was also against mixing feeds configuration with application variables.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON is a pain imo. When manually editing config files I like to:

Yes, that's why I didn't mention it when I was talking about which DSL to pick for KOReader :) I am using json mostly as an example for how far simple data structure can get you.

JSON is also not designed for better readability and the encode/decoding performance is really bad. I would not pick JOSN for any of my project unless I need to integrate with a web UI or any legacy REST system.

I also wrote a lot of XML 6 years ago when I was writing chapters using docbook. Oh god, it was painful.

function NewsDownloader:removeAllExceptFeedConfig(dir, rmdir)
local ffi = require("ffi");

require("ffi/zeromq_h")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see you using zeromq here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I'll remove that.

@mwoz123
Copy link
Copy Markdown
Contributor Author

mwoz123 commented Mar 14, 2017

simpleTreeHandler is an internal function within handler module, so it's not visible to main.lua. You need to export the function in handler.lua and require handler.lua in main.lua.

As this is external library (and so I woudn't like to make changes in external libs), can we ignore that warning?
Maybe as thoose are the only warnings in that method could we exclude that method from being checked? I tried to add '-- luacheck: ignore' after method name but it didn't work. Is there a other way to do it?

Ofcourse I'll fix warnings in other methods.

@mwoz123
Copy link
Copy Markdown
Contributor Author

mwoz123 commented Mar 18, 2017

Fixed warnings. All checks pass. Can you merge it now?

@mwoz123
Copy link
Copy Markdown
Contributor Author

mwoz123 commented Mar 25, 2017

Guys, any comments? When you 'll find time to review it?

text = _("Help"),
callback = function()
UIManager:show(InfoMessage:new{
text = _("Script uses config from " .. self:getNewsDirPath() .. config.FEED_FILE_NAME .. " to download feeds to " .. self:getNewsDirPath() .. " directory."),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The translation won't function this way. It needs to be something a little more like this:

text = _("Script uses config from %1%2 to download feeds to %3 directory.", self:getNewsDirPath(), config.FEED_FILE_NAME, self:getNewsDirPath()),

end

UIManager:show(InfoMessage:new{
text = _("Downloading News Finished.")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be lowercase.

Downloading news finished.

for index, feed in pairs(feedSources.feeds.feed) do
local url = feed[1];
UIManager:show(InfoMessage:new{
text = _("Processing: ") .. url,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_("Processing: %1", url)

local feedXmlFilePath = self:getFeedXmlPath();

if not util.file_exists(feedXmlFilePath) then
DEBUG("Creating init configuration");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefix the debug message with where it's coming from.

@Frenzie
Copy link
Copy Markdown
Member

Frenzie commented Mar 25, 2017

Thanks!

@mwoz123
Copy link
Copy Markdown
Contributor Author

mwoz123 commented Mar 26, 2017

Changes after review done. Lua check warnings fixed.
Please review again.

Frenzie
Frenzie previously approved these changes Mar 26, 2017
@Frenzie Frenzie dismissed their stale review March 26, 2017 10:55

remarks taken care of

@mwoz123
Copy link
Copy Markdown
Contributor Author

mwoz123 commented Mar 26, 2017

@Frenzie - thanks.

Any other blockers before it can be merged to master? @houqp ?

@houqp
Copy link
Copy Markdown
Member

houqp commented Mar 27, 2017

Sorry, busy dealing with personal life events right now. I will send a PR to your branch to address some of the issues as soon as possible.

@mwoz123
Copy link
Copy Markdown
Contributor Author

mwoz123 commented Mar 27, 2017

@houqp Ok. Thanks. Np. Looking forward for that PR;)

@houqp
Copy link
Copy Markdown
Member

houqp commented Apr 12, 2017

OK, I will merge it after we do the final change to migrate the config over. I think this is important because once the code is merged in, people will start using it in the nightly builds. And when we change it to another format afterwards, it will firstly break their existing configs and secondly leave out the old config to not be cleaned up. We can add code to automatically migrate the config, but it will be bloating the code base for a temporary change. If you don't have time to work on this part, I will help get it implemented. This will be one of the highest priority item on my todo list FWIW.

@mwoz123
Copy link
Copy Markdown
Contributor Author

mwoz123 commented Apr 13, 2017

Any examples, where such configuration was used, so maybe I could also try to help?

@houqp
Copy link
Copy Markdown
Member

houqp commented Apr 13, 2017

https://github.com/koreader/koreader/blob/master/plugins/perceptionexpander.koplugin/main.lua is a good example for both managing settings using LuaSettings module and linking it to the UI.

I am happy to merge without the UI part, I think that can be added later. I just want to make sure users are not hit by the config migration issue.

@mwoz123
Copy link
Copy Markdown
Contributor Author

mwoz123 commented Apr 14, 2017

so what should be the new feed configuration format, because I'm now confused ? Lua?

@Frenzie
Copy link
Copy Markdown
Member

Frenzie commented Apr 14, 2017

Yeah, @houqp proposed something alone these lines above:

return {
    -- list your feeds here:
    { "http://www.pcworld.com/index.rss", limit = 2 },
    { "http://rss.cnn.com/rss/edition_world.rss", limit = 3 },
    -- set limit to "0" means no download, "-1" no limit.
    { "http://feeds.reuters.com/reuters/UKSportsNews?format=xml", limit = 0 },
}

@mwoz123
Copy link
Copy Markdown
Contributor Author

mwoz123 commented Apr 14, 2017

So Lua will be the final configuration type?Is it the last that change before merge?
You don't plan to migrate it to yaml or something else?

Anyway as we already have "initial configuration" functionality it will mean that any configuration (e.g. structure) changes it won't break plugin but rather cause download the initial example feeds, and so user will have to (again) manually update the new config file (which is ofcourse a drawback).
But IMHO waiting so long (as it's already working fine version) before merge is also a drawback..

And as Koreader is not provided out of the box with any readers, its users have to be at least some technical.
So any user that is able to edit the configuration (no matter if it's a xml,lua, or something else) should be also able to edit new conf.

@houqp
Copy link
Copy Markdown
Member

houqp commented Apr 14, 2017

So Lua will be the final configuration type?Is it the last that change before merge?

Lua will be the config DSL for now. This is the last blocker from me as I mentioned in my previous review comment.

You don't plan to migrate it to yaml or something else?

We do have the plan to migrate to a proper config DSL, but that's a large undertaking project because we need to make sure all Lua configs are handled properly and seamlessly without interruption to the users.

Anyway as we already have "initial configuration" functionality it will mean that any configuration (e.g. structure) changes it won't break plugin but rather cause download the initial example feeds, and so user will have to (again) manually update the new config file (which is ofcourse a drawback).
But IMHO waiting so long (as it's already working fine version) before merge is also a drawback..

Like I said in my previous comment, there are two problems: 1. users have to manually update the new config. 2. old config will be left there untouched, users will be confused because they don't know if it's ok to delete it or not.

BTW, it wasn't a working fine version until you merged my two patches last night. The UI update was not working due to blocking function calls.

And as Koreader is not provided out of the box with any readers, its users have to be at least some technical.
So any user that is able to edit the configuration (no matter if it's a xml,lua, or something else) should be also able to edit new conf.

I am not saying users won't be able to edit lua config, its already proven that most of the users have no problem understanding and editing it. It's that extra work that we are imposing to the users because we developers don't want to spend extra time and effort polishing the code.

I consider myself a technical user, and I am certainly not happy every time I had to manually fix the stuff I use due to developer laziness.

@houqp
Copy link
Copy Markdown
Member

houqp commented Apr 14, 2017

Again, like I mentioned many times before, if you can't wait to use this on your device, you can always use the extra plugin look up path feature to load external plugins that's not shipped by koreader. I literally wrote that feature for you so you can load extra plugins in android.

@mwoz123
Copy link
Copy Markdown
Contributor Author

mwoz123 commented Apr 17, 2017

Failure → spec/front/unit/version_spec.lua @ 7

Version module should get current revision

spec/front/unit/version_spec.lua:8: Expected objects to be the same.

Passed in:

(boolean) false

Expected:

(boolean) true

stack traceback:

	spec/front/unit/version_spec.lua:8: in function <spec/front/unit/version_spec.lua:7>

make: *** [testfront] Error 1

The command ".ci/script.sh" exited with 2.`

Could you fix that?

@Frenzie
Copy link
Copy Markdown
Member

Frenzie commented Apr 17, 2017

We can safely ignore that error for this PR, but rebasing on top of the latest master should also fix it (see #2767).

-- only supports http URL for now
-- Atom is currently not supported, only RSS
-- set limit to "0" means no download, "-1" no limit.
{ "http://www.huffingtonpost.com/feeds/index.xml", limit = 2 },
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wikipedia says this portal is "overtly liberal/left commentary".
I don't want this plugin to be connected with politician option.

Let's use something that's neutral view if not then let's replace that with sport source.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, although apparently easier said than done. Browsing around a little, it would seem that most sources only give you the headline in the feed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we not rely on "description" but rather url link to html document it shouldn't be so hard.

Just need to find RSS format (as atom is yet not supported/fully tested) neutral news source if that would be too hard let's take a sport source.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any proposals?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like NPR or the BBC?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BBC got RSS-like structure but atom in "header" not sure if that's will be handled correctly.

What you think about AP?
BTW @Frenzie I merged from master but it didn't solve the travis issue.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AP relays on injection XMLs into pages, so it wasn't correct on our simple "wget" we do for download.

I changed that to economist science and culture(disabled by default) as they should be neutral.
Fell free to update it to something better that's international, RSS & neutral.

@mwoz123 mwoz123 force-pushed the simple-rss-downloader-plugin branch 2 times, most recently from 48c8c86 to d09fd67 Compare April 18, 2017 20:45
@mwoz123 mwoz123 force-pushed the simple-rss-downloader-plugin branch from d09fd67 to e14e4d0 Compare April 18, 2017 20:48
@mwoz123
Copy link
Copy Markdown
Contributor Author

mwoz123 commented Apr 18, 2017

@Frenzie merge from master didn't solve the travis error, but as you said it can be ignored I think we're finally ready to merge to master?

@Frenzie Frenzie merged commit d6c81f5 into koreader:master Apr 19, 2017
@Frenzie
Copy link
Copy Markdown
Member

Frenzie commented Apr 19, 2017

Merging now since @houqp said the configuration was the last blocker.

@Frenzie Frenzie mentioned this pull request Apr 24, 2017
@mwoz123 mwoz123 deleted the simple-rss-downloader-plugin branch October 28, 2017 17:49
@roygbyte
Copy link
Copy Markdown
Contributor

Pinging this thread to see if folks want to contribute ideas towards Newsdownloader's rebuild. See this thread for more info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants