Skip to content

Mutex around parse#198

Merged
mgedmin merged 3 commits intolinkchecker:masterfrom
yarikoptic:bf-mutex-parse
Nov 22, 2018
Merged

Mutex around parse#198
mgedmin merged 3 commits intolinkchecker:masterfrom
yarikoptic:bf-mutex-parse

Conversation

@yarikoptic
Copy link
Copy Markdown
Contributor

Portion from #194 which is trivial and independent from that one which might take longer to fixup properly.

That leads to fix up of anchors analysis and probably other issues
such as floating number of found urls etc
content = url_data.get_content()
with parse_mutex:
parser.feed(content)
parser.flush()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing previous context, but it's unclear to me why parser.feed() is not thread-safe, but parser.flush() is fine?

Are we using the same parser instance in multiple threads? (It doesn't look like we do.)

Is the parser using global state shared between multiple parser instances?

Is it perhaps GNU bison that's giving us a thread-unsafe parser?

Perhaps we should prioritize #119 that replaces our homegrown C parser with BeatifulSoup?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, necessity to mutex it came from only brief code analysis and trial/error tests... Could be flush needs it too, and that I just didn't manage to trip is unsafe operation. I will shift it into withing murex block too just to be safe(r)

Just a safety measure, not yet proven to be required but overall
makes sense
@yarikoptic
Copy link
Copy Markdown
Contributor Author

pushed that little patch offsetting flush to be under mutex as well.

Would be nice if there was a test demonstrating its need... unfortunately the one targeting anchors being added in #194 is not triggering it AFAIK, and originally I detected it while running on a ad-hoc test case I simulated in local collection of .htmls I generated by building https://github.com/bids-standard/bids-specification .

Copy link
Copy Markdown
Contributor

@mgedmin mgedmin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, for values of "good" meaning "omg everything's terrible, let's get rid of our custom C code ASAP, but until we actually do that it's better to have this bandaid than to keep corrupting memory or whatever the consequences of this lack of thread-safety are".

(I'm commenting instead of approving because I would like to know exactly where in the C code things are breaking without the mutex to be sure that this fix is correct. Unfortunately I have no time/inclination to study the C sources generated by bison.)

@yarikoptic
Copy link
Copy Markdown
Contributor Author

I have no time/inclination to study the C sources generated by bison

who would? ;-)
Unfortunately I've not kept an original ad-hoc modified sample to test anchors situation and ATM without it cannot reproduce fluctuating reported links count to provide you undeniable evidence of positive effects.

@yarikoptic
Copy link
Copy Markdown
Contributor Author

what should be done to get at least this one merged? ;-)

@mgedmin mgedmin requested a review from anarcat November 22, 2018 09:53
@mgedmin
Copy link
Copy Markdown
Contributor

mgedmin commented Nov 22, 2018

I would like to hear @anarcat's opinion.

Copy link
Copy Markdown
Contributor

@anarcat anarcat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm terrified by this stuff. like @mgedmin said:

This looks good to me, for values of "good" meaning "omg everything's terrible, let's get rid of our custom C code ASAP, but until we actually do that it's better to have this bandaid than to keep corrupting memory or whatever the consequences of this lack of thread-safety are".

So until we get rid of those horrors, let's just move ahead with this if if actually fixes a bug. Unfortunately, it's not clear if this fixes completely #179 - I wish we could just map this MR to a specific bug report at least. But considering I don't really understand the problem and locking here shouldn't introduce any critical security issue, i'm okay with trusting people that actually did the debugging and hope for the best. :)

thanks for your contribution.

@anarcat
Copy link
Copy Markdown
Contributor

anarcat commented Nov 22, 2018

@mgedmin i'll let you do the honors considering you actually herded this considerably more than i did. :)

@mgedmin
Copy link
Copy Markdown
Contributor

mgedmin commented Nov 22, 2018

Ha! You just want me to feel responsible if this ever backfires ;)

@mgedmin mgedmin merged commit 2142920 into linkchecker:master Nov 22, 2018
@anarcat
Copy link
Copy Markdown
Contributor

anarcat commented Nov 22, 2018

yes, of course, that too. :p

@yarikoptic
Copy link
Copy Markdown
Contributor Author

Thank you guys! I still hope some time to get back to #194 -- feel welcome to beat me to it ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants