Skip to content

Fix XmlTagUrlParser and make Python 3 compatible#335

Merged
mgedmin merged 1 commit intolinkchecker:masterfrom
cjmayo:sitemap
Oct 29, 2019
Merged

Fix XmlTagUrlParser and make Python 3 compatible#335
mgedmin merged 1 commit intolinkchecker:masterfrom
cjmayo:sitemap

Conversation

@cjmayo
Copy link
Copy Markdown
Contributor

@cjmayo cjmayo commented Oct 26, 2019

URLs within a sitemap file were not being captured.

@cjmayo
Copy link
Copy Markdown
Contributor Author

cjmayo commented Oct 26, 2019

It looks like sitemap checking never worked, this is an attempt to fix that as well as Python 3 support. Having said that I haven't done a successful check of a real world sitemap file.

I have gone back to url_data.get_content(), i.e. passing a string to xmlparser.Parse() and it is succeeding here on Python 2. @mgedmin I don't know what test data you used that failed, maybe this works because the test file states an encoding? Else we could pass the encoding from url_data to xml.parsers.expat.ParserCreate() and force the issue.

Copy link
Copy Markdown
Contributor

@mgedmin mgedmin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch about the sitemap parser being broken!

Apologies for creating extra work for you, but the regression test I thought I enabled wasn't actually testing the thing I thought it was testing -- passing unicode data to the expat parser on Python 2 "works" as long as that unicode data contains only characters in the ASCII subset.

URLs within a sitemap file were not being captured.
Copy link
Copy Markdown
Contributor

@mgedmin mgedmin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mgedmin mgedmin merged commit c294a4e into linkchecker:master Oct 29, 2019
@cjmayo cjmayo deleted the sitemap branch November 21, 2019 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants