Skip to content

[WIP] Python3 Support#210

Closed
cjmayo wants to merge 3 commits intolinkchecker:masterfrom
cjmayo:python3
Closed

[WIP] Python3 Support#210
cjmayo wants to merge 3 commits intolinkchecker:masterfrom
cjmayo:python3

Conversation

@cjmayo
Copy link
Copy Markdown
Contributor

@cjmayo cjmayo commented Apr 10, 2019

Do not merge - for tracking PRs only

This branch is a rebase of that by @PetrDlouhy in #40, needed because master has moved on.

These commits were dropped because they are already in master:

  • Python3: fix imports
  • fixes for Python 3: fix proxysuport

These needed slight editing to apply because of commits that have since been made to master (code changes are the same):

  • Python3: decode parts before submitting them to urllib.quote()
  • fixes for Python 3 + Travis test: fix cookies

Typos in titles were fixed for:

  • Python3: update makefiles to test in Pyhon3 in allow-failures mode
  • Python3: fix url.enswith in url.py
  • Python3: fix test_parser test for removing invalit entity
  • Python3: fix default encoding in htmllib to allow parsig of entities

The commits in this branch are being submitted in separate PRs against master. PR titles start with {python3_nn} reflecting their branch name. The submission of PRs is limited by the commits that will apply on the current master. Most PRs are single commits, when a commit will not apply on master an attempt is made to add it to the previous branch before stopping.

As well as tracking the PRs for merging hopefully this PR will demonstrate the final state of the CI tests. There are some additional commits that fix various warnings.

@cjmayo cjmayo mentioned this pull request Apr 10, 2019
@mgedmin
Copy link
Copy Markdown
Contributor

mgedmin commented Apr 11, 2019

Thank you very much for doing this hard work!

I have one suggestion: could we have a PR that adds tests on some version of Python 3 to .travis-ci.yml, initially with a matrix: allow_failures so they don't count as real failures? I would love to merge this early so we could then see the number of failing tests decrease as individual pull requests get merged.

@anarcat
Copy link
Copy Markdown
Contributor

anarcat commented Apr 11, 2019

awesome?

should we just close #40 in favor of this one?

i've just more or less blindly merged everything that @mgedmin approved after a quick review, i will mostly try to get the easy ones out of the way and support @mgedmin in every other review. :)

@cjmayo
Copy link
Copy Markdown
Contributor Author

cjmayo commented Apr 12, 2019

I have one suggestion: could we have a PR that adds tests on some version of Python 3 to .travis-ci.yml, initially with a matrix: allow_failures so they don't count as real failures? I would love to merge this early so we could then see the number of failing tests decrease as individual pull requests get merged.

Hasn't that already been done?:
4382215#diff-354f30a63fb0907d4ad57269548329e3

should we just close #40 in favor of this one?

Fine by me. There is a link at the top if anyone wants to see the history.

(Not sure I'll do the force commit on this one too often. I thought it would hide the previous ones and show the overall progress. Instead it's just making this page rather long).

@mgedmin
Copy link
Copy Markdown
Contributor

mgedmin commented Apr 12, 2019

Hasn't that already been done?:
4382215#diff-354f30a63fb0907d4ad57269548329e3

And now everybody knows that

  • I have the memory of a goldfish
  • I never looked at the CI details of the PRs I've reviewed, just the overall green checkmark

@cjmayo
Copy link
Copy Markdown
Contributor Author

cjmayo commented Apr 13, 2019

Just test_parser.py failing for Python 3.5. Any ideas? Looks like it is chopping off the first character.

@mgedmin
Copy link
Copy Markdown
Contributor

mgedmin commented Apr 14, 2019

Just test_parser.py failing for Python 3.5. Any ideas? Looks like it is chopping off the first character.

I tried to pdb through the test, and it's worse than that: the html parser is feeding < and zgür> fahr ' to handler.characters()`, so it doesn't even recognize this as an HTML tag.

The html parser itself is written in C so I can't exactly step through it with pdb.

@anarcat
Copy link
Copy Markdown
Contributor

anarcat commented Apr 15, 2019

we're doing really great here - i've merged a bunch more PRs and we're making good progress. keep the reviews/patches coming! i also think we should close #40 by now, so i'll just do so already.

@cjmayo
Copy link
Copy Markdown
Contributor Author

cjmayo commented Apr 16, 2019

#216, #220, #223 are ready just waiting for an ack I think.

Updated the branch.

Dropped:

  • {python3_01} Python3: update makefiles to test in Python 3 in allow-failures mode

Added some WIP patches to hopefully get the tests passing. We know about the malloc issue and HtmlParser tests. Let's see that everything else is working.

@cjmayo cjmayo force-pushed the python3 branch 2 times, most recently from ccb7518 to 958c7b2 Compare April 21, 2019 18:27
@cjmayo cjmayo force-pushed the python3 branch 2 times, most recently from c295ff8 to 68550f6 Compare April 22, 2019 18:24
@mgedmin
Copy link
Copy Markdown
Contributor

mgedmin commented Apr 24, 2019

We're pretty close to eliminating errors during test collection! Only two changes are needed for that:

Now tox -e py36 collects all the tests and 109 of them even pass! (And then everything hangs because ???? no seriously why? some background thread refuses to join.)

@mgedmin
Copy link
Copy Markdown
Contributor

mgedmin commented Apr 24, 2019

Incidentally, are we measuring test coverage anywhere? I don't think so...

EDIT: duh! #41.

It's just git grep coverage doesn't show much when the coverage is collected with pytest-cov.

coverage report says 69%, which is ... not great.

@cjmayo
Copy link
Copy Markdown
Contributor Author

cjmayo commented Apr 25, 2019

import __init__ as init in tests/checker/test_all_parts.py (I don't remember seeing a PR for this)

The TODO list was quite long... Submitted as #252.

@mgedmin
Copy link
Copy Markdown
Contributor

mgedmin commented Apr 25, 2019

Now tox -e py36 collects all the tests and 109 of them even pass! (And then everything hangs because ???? no seriously why? some background thread refuses to join.)

The hang is caused by tests/checker/test_telnet.py.

@mgedmin
Copy link
Copy Markdown
Contributor

mgedmin commented Apr 25, 2019

The hang is caused by tests/checker/test_telnet.py.

#253 should take care of that.

@mgedmin mgedmin mentioned this pull request May 24, 2019
@SchoolGuy
Copy link
Copy Markdown

SchoolGuy commented Jul 16, 2019

I would be happy to contribute to this. I would open a PR to actually fix the unicode problems with linkchecker but that would break Py2 compatibility. But let's discuss this in the PR I am opening.

Why are you not merging all the work with Python3? What need's to be done to make it finally Py3 compatible!

@cjmayo
Copy link
Copy Markdown
Contributor Author

cjmayo commented Jul 16, 2019

The "final" step is changing over to use Beautiful Soup as the HTML parser. PR #249 tracks that (i.e. does work on Python 3).

But it's built on top of all the other changes, of which we are currently at #230.

@SchoolGuy
Copy link
Copy Markdown

@cjmayo As far as I see you have completed the work. So can I somehow help with getting this into master. At openSUSE we are using a really old version because we want also to ship the GUI which is not working according to my tests with Python3 (Python 2 is by default not installed anymore).

@anarcat
Copy link
Copy Markdown
Contributor

anarcat commented Jul 17, 2019

@SchoolGuy if you want to help, provide a review and suggestions in #230 and #249. thanks!

@anarcat
Copy link
Copy Markdown
Contributor

anarcat commented Sep 5, 2019

if @mgedmin is okay with it, i'd be happy to review this PR as is, it's not as crazy long as it was before. :)

@cjmayo
Copy link
Copy Markdown
Contributor Author

cjmayo commented Sep 5, 2019

Let's see. I can still split it up further. I've tried my script for that: it looks like the first would be a batch of 5, and I can see some potential work on them up already.

@mgedmin
Copy link
Copy Markdown
Contributor

mgedmin commented Sep 6, 2019

I've no objections, as long as I don't have to review massive diffs ;)

@cjmayo
Copy link
Copy Markdown
Contributor Author

cjmayo commented Sep 6, 2019

Sounds like the compromise is carrying on with individual PRs. I've created 22-26, bit rusty when I did that so there were some quick force pushes there.

@mgedmin
Copy link
Copy Markdown
Contributor

mgedmin commented Sep 9, 2019

Force-pushes to PR brancehs are great, I love force-pushes to PR branches.

@cjmayo
Copy link
Copy Markdown
Contributor Author

cjmayo commented Sep 15, 2019

The batch of python3_nn PRs I have just pushed is all the commits from this PR that do not change the HTML parser code. After they have all been merged I propose to submit my cgi.escape patch and then close this PR and move to the Beautiful Soup PR #249. All of the remaining commits from here need testing against Beautiful Soup and not the old parser (which also can't be run with Python 3). It will also make it easier to check there is not overlap or rework, and in general to only have one branch.
Something to think about is how we then merge them into master. I guess it will be best to talk about that on #249 .

@cjmayo
Copy link
Copy Markdown
Contributor Author

cjmayo commented Sep 15, 2019

I have dropped for now:
92945a7 ("Python3: fix loggers", 2019-07-22)
I don't get any test failures because of this. As always of course maybe that means we are missing some tests (I haven't checked).

@anarcat
Copy link
Copy Markdown
Contributor

anarcat commented Sep 16, 2019

this is awesome @cjmayo - keep going! :)

@SchoolGuy
Copy link
Copy Markdown

@cjmayo Thanks for your work! When you are finished I will start reviving the GUI and packaging in the Open Build Service!

@cjmayo
Copy link
Copy Markdown
Contributor Author

cjmayo commented Sep 30, 2019

Brought up-to-date with the latest PRs and dropped:
edff42a ("Python3: fix parser to accept string in parse_feed", 2018-01-06)
which only affected the old HTML parser that will not work with Python 3.

@anarcat
Copy link
Copy Markdown
Contributor

anarcat commented Sep 30, 2019

wow, only 10 commits left, that's great!

@hroncok
Copy link
Copy Markdown
Contributor

hroncok commented Oct 3, 2019

Just a note that I've tried to build git docs in Fedora with current HEAD fo this, 4b3359f, and it failed with trackebacks like:

  File "/usr/lib64/python3.8/site-packages/linkcheck/checker/urlbase.py", line 650, in read_content
    line: buf.write(data)
    locals:
      buf = <local> <_io.StringIO object at 0x7fb1e442e9d0>
      buf.write = <local> <built-in method write of _io.StringIO object at 0x7fb1e442e9d0>
      data = <local> b'<?xml version="1.0" encoding="UTF-8"?>\r\n<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"\r\n    "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">\r\n<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">\r\n<head>\r\n<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UT...
TypeError: string argument expected, got 'bytes'

Full logs: builder-live.log.gz

It previously worked on 72b85c4.

@cjmayo
Copy link
Copy Markdown
Contributor Author

cjmayo commented Oct 4, 2019

Just a note that I've tried to build git docs in Fedora with current HEAD fo this, 4b3359f,

Thanks for commenting. It's probably not clear but this PR has become just a tracker (indeed 33cbd14 disables the parser tests) for us working through the commits from #40 . There are only a few more to go and then we can close it.

#249 is the real Python 3 version. It is currently a bit out of date, hopefully not for long, and a bit experimental, we need to clear some of the non WIP PRs. so please don't expend any time testing right now (do watch for updates though!).
.

@cjmayo
Copy link
Copy Markdown
Contributor Author

cjmayo commented Oct 4, 2019

One more point from the last post: just in case you haven't noticed N.B. I do rebase the [WIP] PRs, it's just easier.

Dropped:
7d4f31a ("Python3: fix setting encoding to HtmlParser", 2018-01-06)
1d2a92d ("Python3: fix other htmllib problems", 2018-01-05)
4b63751 ("Python3: fix test_base", 2018-01-06)
ae2dd3f ("Python3: fix parset charset encoding", 2018-01-06)

not needed with bs4.

@hroncok
Copy link
Copy Markdown
Contributor

hroncok commented Oct 4, 2019

Sure thing, I just wanted to give a heads up in case the failure wasn't known yet.

@cjmayo
Copy link
Copy Markdown
Contributor Author

cjmayo commented Oct 5, 2019

Dropped:
36bc02b ("Python3: fix default encoding in htmllib to allow parsig of entities", 2018-01-07)
8717771 ("Python3: fixes for test_parser.py", 2018-01-06)
7623a51 ("Python3: fix test_parser test for removing invalit entity", 2018-01-07)

"Python3: fix few htmllib problems" is included in #249, with some of its changes moved into "Remove home-cooked htmlparser and use BeautifulSoup", so that's it for this one!

@cjmayo cjmayo closed this Oct 5, 2019
@anarcat
Copy link
Copy Markdown
Contributor

anarcat commented Oct 10, 2019

okay, i'm a bit confused now. are we python-3 yet now? :) or does this happen when we finish the BS4 conversion in #249?

thanks, that's pretty awesome work!

@SchoolGuy
Copy link
Copy Markdown

Well tests are failing so I don't think so. Or is this because they are broken in general?

@cjmayo
Copy link
Copy Markdown
Contributor Author

cjmayo commented Oct 10, 2019

okay, i'm a bit confused now. are we python-3 yet now? :)

Closing this one just means that all the commits from #40 have been worked through.

or does this happen when we finish the BS4 conversion in #249?

Yes, without bs4 there is no Python 3. #249 is the sole tracker now. Unlike this PR though all the tests are enabled for Python 3 and passing - including some new ones.

@cjmayo cjmayo deleted the python3 branch November 21, 2019 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants