{python3_20} Python3: decode parts before submitting them to urllib.quote()#230
{python3_20} Python3: decode parts before submitting them to urllib.quote()#230anarcat merged 1 commit intolinkchecker:masterfrom
Conversation
|
Apply after #227, this is rebased on python3_17 which is the first commit in this PR. |
a296b71 to
abedcd1
Compare
| query = query.encode(encoding, 'ignore') | ||
| # if ? is in the query, split it off, seen at msdn.microsoft.com | ||
| append = "" | ||
| query = decode_for_unquote(query) |
There was a problem hiding this comment.
This feels wrong. We've only just encoded it to encoding or url_encoding (which is not necessarily UTF-8) three lines up from here, and now we're decoding as UTF-8.
There was a problem hiding this comment.
I tried taking it out and got 67 failures including:
_____________________________________________ TestMailBad.test_error_mail ______________________________________________
[gw14] linux -- Python 3.6.8 /usr/bin/python3.6
self = <tests.checker.test_mail_bad.TestMailBad testMethod=test_error_mail>
def test_error_mail (self):
# too long or too short
> self.mail_error(u"mailto:@")
tests/checker/test_mail_bad.py:28:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/checker/__init__.py:279: in mail_error
return self.mail_test(addr, u"error", **kwargs)
tests/checker/__init__.py:283: in mail_test
url = self.norm(addr)
tests/checker/__init__.py:191: in norm
return linkcheck.url.url_norm(url, encoding=encoding)[0]
linkcheck/url.py:334: in url_norm
urlparts[3] = url_parse_query(urlparts[3], encoding=encoding)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
query = b'', encoding = None
def url_parse_query (query, encoding=None):
"""Parse and re-join the given CGI query."""
if isinstance(query, str_text):
if encoding is None:
encoding = url_encoding
query = query.encode(encoding, 'ignore')
# if ? is in the query, split it off, seen at msdn.microsoft.com
append = ""
> while '?' in query:
E TypeError: a bytes-like object is required, not 'str'
linkcheck/url.py:278: TypeError
There was a problem hiding this comment.
On the other hand removing the section above:
--- a/linkcheck/url.py
+++ b/linkcheck/url.py
@@ -269,10 +269,6 @@ def url_fix_wayback_query(path):
def url_parse_query (query, encoding=None):
"""Parse and re-join the given CGI query."""
- if isinstance(query, str_text):
- if encoding is None:
- encoding = url_encoding
- query = query.encode(encoding, 'ignore')
# if ? is in the query, split it off, seen at msdn.microsoft.com
append = ""
query = decode_for_unquote(query)and the python3 branch passes tests for 2.7 and 3.6.
Although there is a warning with 2.7:
linkcheck/checker/ftpurl.py:140
/var/tmp/portage/net-analyzer/linkchecker-9999-r100/work/linkchecker-9999/linkcheck/checker/ftpurl.py:140: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
if self.filename in files:
I'm tempted to ignore that though and just press on bearing in mind we will go Python 3 only at the end of this.
There was a problem hiding this comment.
Although there is a warning with 2.7:
linkcheck/checker/ftpurl.py:140 /var/tmp/portage/net-analyzer/linkchecker-9999-r100/work/linkchecker-9999/linkcheck/checker/ftpurl.py:140: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal if self.filename in files:
Actually looks like that is nothing to do with this, but a consequence of enabling the FTP test.
There was a problem hiding this comment.
I'm tempted to ignore that though and just press on bearing in mind we will go Python 3 only at the end of this.
I'm tempted to do that as well since I cannot muster the energy to analyze all the sources to figure out what's the right thing to do....
|
Rebased on master and add a commit removing the encoding which I will squash in if agreed. |
|
FWIW testing this merged into rebased (on master) version of #194 which adds tests of anchors (merge tagged/pushed to my fork https://github.com/yarikoptic/linkchecker/releases/tag/9.4.0.anchorfix2%2Bpy3fix if of need to reproduce), and getting |
I think you're testing: Despite the name none of these python3_nn branches work on Python 3. For this specific error I guess the commit "Python3: fix url.endswith in url.py" 1528b4c is needed. I test Python 3 against the htmlparser-beautifulsoup branch in #249. Sounds like something worth pursuing with the anchors. But beware this branch does get rebased. |
|
OK. Squashed into one commit for merging. |
SchoolGuy
left a comment
There was a problem hiding this comment.
I see that this is a good fix for having internally the Py2 Byte-Strings and then converting to Unicode when needed. If we have a stable Py3 version I would suggest that we use the new Py3 str implementation as much as possible.
|
@SchoolGuy did you test the new code at all? @yarikoptic had problems with it... @yarikoptic do you think your errors are related to #194 or inherent to this PR? |
|
@anarcat I wasn't able to even get the C stuff to work with the provided docs. The tests are always failing because they are not able to find the C extensions |
@anarcat : seems to be unrelated to #194 since fails in current head of this PR (click to expand)(git)hopa:~/proj/misc/linkchecker[pull/origin/230/head]git
$> git describe
fatal: No annotated tags can describe 'a6643034fbb442452092353b8ffcb8ae77cf8279'.
However, there were unannotated tags: try --tags.
(dev3) 1 11512 ->128 [1].....................................:Thu 18 Jul 2019 10:21:45 AM EDT:.
(git)hopa:~/proj/misc/linkchecker[pull/origin/230/head]git
$> git describe --tags
v9.4.0-132-ga6643034
(dev3) 1 11513 [1].....................................:Thu 18 Jul 2019 10:21:47 AM EDT:.
(git)hopa:~/proj/misc/linkchecker[pull/origin/230/head]git
$> python -m pytest -s -v --tb=short --pdb tests/checker/test_anchor.py
====================================================== test session starts ======================================================
platform linux -- Python 3.7.3rc1, pytest-3.10.1, py-1.7.0, pluggy-0.8.0 -- /home/yoh/proj/misc/linkchecker/venvs/dev3/bin/python
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/home/yoh/proj/misc/linkchecker/.hypothesis/examples')
rootdir: /home/yoh/proj/misc/linkchecker, inifile:
plugins: localserver-0.5.0, hypothesis-3.71.11
collected 1 item
tests/checker/test_anchor.py::TestAnchor::test_anchor FAILED
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> traceback >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
tests/checker/test_anchor.py:31: in test_anchor
nurl = self.norm(url)
tests/checker/__init__.py:191: in norm
return linkcheck.url.url_norm(url, encoding=encoding)[0]
linkcheck/url.py:353: in url_norm
if url.endswith('#') and not urlparts[4]:
E TypeError: endswith first arg must be bytes or a tuple of bytes, not str
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> /home/yoh/proj/misc/linkchecker/linkcheck/url.py(353)url_norm()
-> if url.endswith('#') and not urlparts[4]:
(Pdb) |
I think the simplest way to test is to type I can't use that inside a Gentoo ebuild so have added the function: python_test() {
cp "${BUILD_DIR}"/lib/linkcheck/HtmlParser/htmlsax*.so linkcheck/HtmlParser/ || die
cp "${BUILD_DIR}"/lib/linkcheck/network/_network*.so linkcheck/network/ || die
PYTHONMALLOC=malloc pytest -n $(makeopts_jobs) tests || die
} |
I don't expect any of the {python3_nn} PRs to pass the tests with Python 3. Some may pass specific tests - I guess the ones with tests named in the titles. I have been using #210 and #249 for Python 3 testing, updating against master to check off these {python3_nn} ones - N.B. that does mean they get rebased. |
Okay. Even though I find it wierd to use a testing tool to build the project I will try that tomorrow at work. |
mgedmin
left a comment
There was a problem hiding this comment.
YOLO, let's merge this and see what happens.
|
fire in the hole. |
No description provided.