Conversation
Regression since koreader#1065 Probably because of implicit type conversions to/from Lua and the fact that Lua numbers are double internally? Or something.
|
This leads me to wonder what the hell we use MD5 for in the first place, since we failed to notice visible breakage despite everything being hashed as |
|
(Caught this purely by accident while checking out a metadata record manually while testing koreader/koreader#3785 (comment) :D). |
|
I think it's mainly for progress sync, which I personally don't use. |
|
Mhhh, I think I just witnessed (still with v2020.03.2-24-g5f771ed_2020-04-05) something with md5, which is also used by the statistics plugin where a book is uniquely identified bt (title, authors, md5). These are all different books, except this morning, 1078 appeared, which is a duplicate of 1074 (so, my stats for this book are splitted in 2). Is this related to this issue & fix ? |
|
Oh right, statistics too. 👍 |
|
The ffffffff... one is certainly what I saw before this PR. So I'd assume that the 00000000... one could be that, too. The mixed stuff bracketed in 00000000 on both edges looks like the CI's Clang failures, so, :?. No idea about the clashes. After a quick look at my own db (on the H2O, which I don't really use), the only collisions I have are for djvu documents (I actually have a bunch of different id's for the same djvu document (the one from the testsuite)?! |
|
Looking at my statistics.sqlite3, my first duplicates (same md5 shared by 2 different books) has a last_open 2019/11/20. When opening a book, purging manually the .sdr/, re-opening, the md5sum in metadata.lua is consistent, both on x86 and arm. |
|
I'll double-check on the Forma, but given that I rarely document swap, I don't expect to have anything much to add to the hunt :/. |
|
There's a function called partial MD5 somewhere. It only takes the first few hundred kilobytes or something. Even so 2% seems absurdly high, that should be more like increasing the chance of collision from minuscule to extremely unlikely. |
|
Probably caused by and since koreader/koreader#5528.
So, we might be re-using the one from the previous book, and when a new book is opened, it's not computed ? Or something... (because if true, that should have happened quite more often...) edit: unless plugins are fully re-parsed: https://github.com/koreader/koreader/blob/f756a6aaa75b81201fbf302adc9d21cef560c810/frontend/pluginloader.lua#L80 |
|
I'm not fully sure how Lua deals with that in the context of pcall. Normally speaking it shouldn't be. |
|
OK, they are Confirmed with some logs, switching Cleaning sdr/s , starting with the 2nd book Cleaning sdr/s , opening one: Quitting, relauching on this one just opened, and loading the other one (with no sdr yet): |
|
I guess this is just enough to solve the issue: --- a/plugins/statistics.koplugin/main.lua
+++ b/plugins/statistics.koplugin/main.lua
@@ -1960,11 +1960,11 @@ function ReaderStatistics:saveSettings()
}
G_reader_settings:saveSetting("statistics", settings)
end
function ReaderStatistics:onReadSettings(config)
- self.data = config.data.stats
+ self.data = config.data.stats or {}
end
function ReaderStatistics:onReaderReady()
-- we have correct page count now, do the actual initialization work
self:initData()On a newly opened book, the doc_setting.stats does not exist yet, so self.data (it is assigned nil, so not created and not masking the base object one) is the "global/base object" one, that is filled with the md5 (and saved to the doc_setting); |
|
OK, pushed a fix, that looks sane enough. I'm still unsure how this happened to me for these books: I guess the first entry might have had a wrong md5 leaked from a previous book - but then, one need to have |
Regression since #1065
Probably because of implicit type conversions to/from Lua
and the fact that Lua numbers are double internally?
Or something.
This change is