Conversation
That leads to fix up of anchors analysis and probably other issues such as floating number of found urls etc
linkcheck/parser/__init__.py
Outdated
| content = url_data.get_content() | ||
| with parse_mutex: | ||
| parser.feed(content) | ||
| parser.flush() |
There was a problem hiding this comment.
I may be missing previous context, but it's unclear to me why parser.feed() is not thread-safe, but parser.flush() is fine?
Are we using the same parser instance in multiple threads? (It doesn't look like we do.)
Is the parser using global state shared between multiple parser instances?
Is it perhaps GNU bison that's giving us a thread-unsafe parser?
Perhaps we should prioritize #119 that replaces our homegrown C parser with BeatifulSoup?
There was a problem hiding this comment.
Oh, necessity to mutex it came from only brief code analysis and trial/error tests... Could be flush needs it too, and that I just didn't manage to trip is unsafe operation. I will shift it into withing murex block too just to be safe(r)
Just a safety measure, not yet proven to be required but overall makes sense
|
pushed that little patch offsetting flush to be under mutex as well. Would be nice if there was a test demonstrating its need... unfortunately the one targeting anchors being added in #194 is not triggering it AFAIK, and originally I detected it while running on a ad-hoc test case I simulated in local collection of .htmls I generated by building https://github.com/bids-standard/bids-specification . |
mgedmin
left a comment
There was a problem hiding this comment.
This looks good to me, for values of "good" meaning "omg everything's terrible, let's get rid of our custom C code ASAP, but until we actually do that it's better to have this bandaid than to keep corrupting memory or whatever the consequences of this lack of thread-safety are".
(I'm commenting instead of approving because I would like to know exactly where in the C code things are breaking without the mutex to be sure that this fix is correct. Unfortunately I have no time/inclination to study the C sources generated by bison.)
who would? ;-) |
|
what should be done to get at least this one merged? ;-) |
|
I would like to hear @anarcat's opinion. |
anarcat
left a comment
There was a problem hiding this comment.
i'm terrified by this stuff. like @mgedmin said:
This looks good to me, for values of "good" meaning "omg everything's terrible, let's get rid of our custom C code ASAP, but until we actually do that it's better to have this bandaid than to keep corrupting memory or whatever the consequences of this lack of thread-safety are".
So until we get rid of those horrors, let's just move ahead with this if if actually fixes a bug. Unfortunately, it's not clear if this fixes completely #179 - I wish we could just map this MR to a specific bug report at least. But considering I don't really understand the problem and locking here shouldn't introduce any critical security issue, i'm okay with trusting people that actually did the debugging and hope for the best. :)
thanks for your contribution.
|
@mgedmin i'll let you do the honors considering you actually herded this considerably more than i did. :) |
|
Ha! You just want me to feel responsible if this ever backfires ;) |
|
yes, of course, that too. :p |
|
Thank you guys! I still hope some time to get back to #194 -- feel welcome to beat me to it ;-) |
Portion from #194 which is trivial and independent from that one which might take longer to fixup properly.