Skip to content

Enable https checking using a test server#338

Merged
anarcat merged 2 commits intolinkchecker:masterfrom
cjmayo:https
Nov 14, 2019
Merged

Enable https checking using a test server#338
anarcat merged 2 commits intolinkchecker:masterfrom
cjmayo:https

Conversation

@cjmayo
Copy link
Copy Markdown
Contributor

@cjmayo cjmayo commented Nov 5, 2019

Previously disabled because it was using Amazon. Verification is disabled because we generate our own certificate.

Also use the generated certificate to test httputil.x509_to_dict() and the fix required for Python 3.

anarcat
anarcat previously approved these changes Nov 5, 2019
Copy link
Copy Markdown
Contributor

@anarcat anarcat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Copy Markdown
Contributor

@mgedmin mgedmin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome!

# Strip pass phrase from the key, else it is requested at server start
subprocess.check_output(["openssl", "pkey",
"-passin", "pass:linkchecker",
"-in", temp_key_file, "-out", get_file("https_key.pem")])
Copy link
Copy Markdown
Contributor

@mgedmin mgedmin Nov 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think this belongs in setUp, not __init__
  2. Surely there's a way of generating a self-signed cert without a password
  3. Generating a new SSL key/cert for every test seems expensive, can we not do that when the key already exists? (And maybe set expiration date to something long -- a decade, a thousand years -- to prevent problems from cert reuse.)
  4. I wonder if pyopenssl (or cryptogtaphy) have APIs for generating SSL keys/self-signed certs, to avoid shelling out.

Copy link
Copy Markdown
Contributor

@mgedmin mgedmin Nov 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re 2: I believe the -nodes argument to openssl req suppresses the password prompt (it means "don't encrypt the key" because openssl's command-line UX is terrible).

See also https://unix.stackexchange.com/a/104305/89841

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Generating a new SSL key/cert for every test seems expensive, can we not do that when the key already exists? (And maybe set expiration date to something long -- a decade, a thousand years -- to prevent problems from cert reuse.)

There should be a way to generate SSL keys with garbage entropy or /dev/urandom so that it's fast. There's such a flag in GnuPG for sure, so maybe there's something similar in openssl?

Seems like a reasonable request anyways.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1. I think this belongs in `setUp`, not `__init__`

Is setUp called for each method, the docs aren't really clear? I did think about setUpClass, saw New in 3.2 and didn't use it - however it is in 2.7 and seems to work.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re 2: I believe the -nodes argument to openssl req suppresses the password prompt (it means "don't encrypt the key" because openssl's command-line UX is terrible).

That works.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re 4: https://skippylovesmalorie.wordpress.com/2010/02/12/how-to-generate-a-self-signed-certificate-using-pyopenssl/

I will have a go. That link suggests you can set the notAfter time which could make for a better test than just True.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, setUp is called before each test method.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that __init__ is called once for every test method too, while constructing the test suite!

The unittest API requires that each instance of a test case has a runTest() method that runs the test. When you define a test class, the default test loader invokes unittest.MakeSuite(YourTestClass) which creates an empty TestSuite(), enumerates all the test methods of the class, and then instantiates the class by passing the test method name to __init__ to get a test case instance that will run that particular method from its runTest().

So if you wanted to avoid creating new SSL keys for every test method, __init__ is the wrong way to go about it.

(Caveat: I don't know for sure that pytest's unittest compatibility mode also works the same way, but I think it's very likely.)

conn = HTTPConnection("localhost:%d" % port)
if https:
conn = HTTPSConnection("localhost:%d" % port,
context=ssl._create_unverified_context())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that a documented API method? I worry because it starts with an underscore.

Copy link
Copy Markdown
Contributor

@mgedmin mgedmin Nov 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_create_unverified_context() is not mentioned in https://docs.python.org/3/library/ssl.html. Let's not use it.

We can inline the implementation from the stdlib and eliminate all the constants from default arguments, which will give us

context = ssl.SSLContext(ssl.PROTOCOL_TLS)
context.check_hostname = False
context.verify_mode = ssl.CERT_NONE

and the docs say that check_hostname is False by default except in one specific circumstance that doesn't apply to us, so this assignment can be omitted. The default value of verify_mode is not documented. I'm pretty sure it is CERT_NONE in all current versions of Python, so we might simplify this to

                conn = HTTPSConnection(..., context=ssl.SSLContext(ssl.PROTOCOL_TLS))

but for readability I'd prefer a top-level constant in the test file

UNVERIFIED_SSL_CONTEXT_FOR_TESTS = ssl.SSLContext(ssl.PROTOCOL_TLS)
...
                conn = HTTPSConnection(..., context=UNVERIFIED_SSL_CONTEXT_FOR_TESTS)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_create_unverified_context() is not mentioned in https://docs.python.org/3/library/ssl.html. Let's not use it.

It came from the http.client.HTTPSConnection documentation:
https://docs.python.org/3/library/http.client.html?highlight=_create_unverified_context#http.client.HTTPSConnection

but I don't mind replacing it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes me feel much better, but also makes me want to SCREAM at Python developers who did this.

conn = HTTPConnection("localhost:%d" % port)
if https:
conn = HTTPSConnection("localhost:%d" % port,
context=ssl._create_unverified_context())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(the above applies here also)

@anarcat anarcat dismissed their stale review November 6, 2019 16:21

mgedmin identified issues so i retract my approval until those are fixed

super(TestHttps, self).__init__(methodName=methodName)
self.handler = CookieRedirectHttpRequestHandler
temp_key_file = get_file("key.pem")
subprocess.check_output(["openssl", "req", "-x509", "-newkey",
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should have been subprocess.check_call rather than subprocess.check_output.

Verification has to be turned off because we are using a
self-signed certificate.
  File "/usr/lib/python3.7/site-packages/linkcheck/httputil.py", line 68, in asn1_generaltime_to_seconds
    line: res = datetime.strptime(timestr, timeformat + 'Z')
    locals:
      res = <local> None
      datetime = <global> <class 'datetime.datetime'>
      datetime.strptime = <global> <built-in method strptime of type object at 0x7fa39064dda0>
      timestr = <local> b'20191106202117Z'
      timeformat = <local> '%Y%m%d%H%M%S'
TypeError: strptime() argument 1 must be str, not bytes

pyOpenSSL OpenSSL.crypto.X509.get_notAfter() returns bytes:
https://www.pyopenssl.org/en/stable/api/crypto.html#OpenSSL.crypto.X509.get_notAfter
@cjmayo
Copy link
Copy Markdown
Contributor Author

cjmayo commented Nov 11, 2019

OK, now with key and certificate generation by pyOpenSSL which enables a better check for x509_to_dict:

self.assertEqual(httputil.x509_to_dict(cert)["notAfter"],
                 "Jan 02 03:04:05 2119 GMT")

@mgedmin
Copy link
Copy Markdown
Contributor

mgedmin commented Nov 12, 2019

(a reminder that force-pushes force me to review the entire thing again, since I cannot see just the changes since the last time)

Copy link
Copy Markdown
Contributor

@mgedmin mgedmin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with one minor change requested!

cert.get_subject().CN = "localhost"
cert.set_serial_number(1000)
cert.gmtime_adj_notBefore(0)
cert.set_notAfter(b"21190102030405Z")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One hundred years, haha!

@anarcat anarcat merged commit c92ab72 into linkchecker:master Nov 14, 2019
@cjmayo cjmayo deleted the https branch November 21, 2019 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants