Use UTF-8 encoding for nginx plugin by g6123 · Pull Request #5341 · certbot/certbot

g6123 · 2017-12-20T11:54:29Z

Updated nginx plugin parser to always use UTF-8 encoding with Python (3.x, 2.x) standard library codecs.

This fixes the issue #5337.

greencopper · 2017-12-27T04:51:51Z

Is there a manual workaround? I have tried keeping the Nginx config file as small as possible, but still get the error on Arch Linux.

g6123 · 2018-01-17T08:16:46Z

@greencopper It is not really clean way, though you can manually apply the changes to the package. The path would be /usr/lib/python3.6/site-packages/certbot_nginx/parser.py or so.

Lillian-Violet · 2018-03-07T19:39:54Z

I was running into this problem for weeks, got the workaround thrown at me to find unicode characters in files. Used the command grep -r -P '[^\x00-\x7f]' /etc/nginx /etc/letsencrypt to see where the unicode characters might be, turns out theres a unicode character in the mime-types file (³ to be exact). So remove the line with that character on (by finding it with the above command, don't remember what it was exactly) and you should be golden.

bmw · 2018-03-26T23:14:51Z

@ohemorange, would you like to review this instead?

deprito · 2018-06-10T14:05:43Z

Just installed via instruction

But, get the same error...

2018-06-10 21:03:05,538:DEBUG:certbot.error_handler:Calling registered functions
2018-06-10 21:03:05,538:INFO:certbot.auth_handler:Cleaning up challenges
2018-06-10 21:03:06,688:DEBUG:certbot.log:Exiting abnormally:
Traceback (most recent call last):
File "/opt/eff.org/certbot/venv/bin/letsencrypt", line 11, in
sys.exit(main())
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot/main.py", line 1323, in main
return config.func(config, plugins)
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot/main.py", line 1086, in run
certname, lineage)
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot/main.py", line 120, in _get_and_save_cert
lineage = le_client.obtain_and_enroll_certificate(domains, certname)
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot/client.py", line 383, in obtain_and_enroll_certificate
cert, chain, key, _ = self.obtain_certificate(domains)
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot/client.py", line 326, in obtain_certificate
orderr = self._get_order_and_authorizations(csr.data, self.config.allow_subset_of_names)
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot/client.py", line 362, in _get_order_and_authorizations
authzr = self.auth_handler.handle_authorizations(orderr, best_effort)
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot/auth_handler.py", line 75, in handle_authorizations
resp = self._solve_challenges(aauthzrs)
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot/auth_handler.py", line 126, in _solve_challenges
resp = self.auth.perform(all_achalls)
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot_nginx/configurator.py", line 1045, in perform
http_response = http_doer.perform()
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot_nginx/http_01.py", line 62, in perform
self.configurator.save("HTTP Challenge", True)
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot_nginx/configurator.py", line 975, in save
self.parser.filedump(ext='')
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot_nginx/parser.py", line 243, in filedump
out = nginxparser.dumps(tree)
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot_nginx/nginxparser.py", line 134, in dumps
return str(RawNginxDumper(blocks.spaced))
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot_nginx/nginxparser.py", line 98, in str
return ''.join(self)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 34: ordinal not in range(128)
2018-06-10 21:03:06,690:ERROR:certbot.log:An unexpected error occurred:

=========

How to upgrade the certbot to 0.26.0?

schoen · 2018-09-12T22:49:33Z

I'm a bit nervous about the possibility that some people's nginx files might also contain binary data or textual data in an encoding other than UTF-8, in which case the Python UTF-8 codec might still crash when trying to read these files.

For example, using Markus Kuhn's stress test file at

https://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt

if we say

>>> import codecs
>>> f = codecs.open("UTF-8-test.txt", "r", "utf-8")`
>>> x = f.read()

we will get a crash because, indeed, the file isn't valid UTF-8.

We might want to say that nginx configuration files must be valid UTF-8 in order for Certbot to parse them (which seems like a perfectly valid kind of requirement), but in that case we'd probably still want to give a more useful error in this context, maybe like telling the user nginx configuration file {} is not a valid UTF-8 text file and couldn't be parsed or something?

I could imagine that there might be people using other character sets in comments. For instance, suppose a Chinese speaker uses BIG5 encoding in comments. In that case the line

# 共

would be encoded with bytes [35, 32, 166, 64, 10]. Unfortunately this is not only not UTF-8 but also has no interpretation as UTF-8.

>>> (b"# \xa6\x40\n").decode("UTF-8")

ends up giving

UnicodeDecodeError: 'utf8' codec can't decode byte 0xa6 in position 2: invalid start byte

😢

schoen · 2018-09-12T22:51:43Z

@g6123, thank you for doing this work. Do you think you could revise this to catch errors when the file contains non-UTF-8 data? I think we can assume that we can reject such files (rather than adding further complexity about detection of encodings) but we will probably want to give a useful error message to the user in such cases.

I will also ask my colleagues to confirm that they agree with this path.

schoen

I think we'll still need to explicitly catch errors when the files aren't valid as UTF-8, even though perhaps we don't need to proceed in such cases.

ohemorange · 2018-09-13T00:17:31Z

@schoen does nginx itself crash on that test file? we just want to match nginx's behavior, not catch everything

schoen · 2018-09-14T00:37:56Z

@ohemorange, sad to say, I've just created an nginx configuration file with a comment containing a Chinese character in BIG5 encoding instead of UTF-8, and nginx happily parsed and accepted this file. But Python3 refuses to read the content of that same file into a UTF-8 string and instead crashes with a UnicodeDecodeError exception.

So I think we do need to deal with this case after all, or else at least some users may continue to get Unicode-related errors even with the improvement made by this PR!

(I would still be OK with having Certbot refuse to process web server configuration files that aren't valid UTF-8, but I think it needs to give a useful error in this situation explaining what happened.)

schoen · 2018-09-18T00:56:23Z

So, we've concluded that we don't necessarily need to match nginx's behavior but we should at least give a friendly error message rather than an error traceback when a non-UTF-8-valid string is encountered in a configuration file. After all, nginx does accept these strings at least inside comments (and maybe they can also be served in web headers or something).

The error message would explain that Certbot is only able to modify nginx configuration files that are valid UTF-8, and indicate which particular file failed to be parsed.

Maybe we could also open a wishlist issue to ask for more closely matching nginx's behavior in this regard (especially if nginx turns out to accept other string encodings outside of comment lines).

@g6123, are you still interested in working on this issue?

zopieux · 2018-09-18T19:44:00Z

Fully agree with @schoen analysis. We cannot reasonably create a fully compatible nginx conf parser+writer in Python, this would be too much work to handle all the corner cases of the original nginx parser in C. A pragmatic fallback though, is to fail gracefully with a useful explanation for the user. Having a line number would be a neat addition.

ohemorange · 2018-10-10T19:24:48Z

Hey @g6123, are you no longer able to work on this?

@zopieux, do you have any objections to taking it over?

See https://mail.python.org/pipermail/python-list/2015-March/687124.html

g6123 · 2018-10-31T08:28:46Z

I also agree with @schoen.

@ohemorange I'm working on it, but it would take some time, since I'm not actively developing Python programs. I've added commits that display warning message and return empty result, with some naive tests. If anyone is afford to take this task over, that's perfectly okay.

ohemorange · 2018-11-08T23:55:33Z

Ok, be sure to let us know if there's anything we could do to help move it along, including questions or thoughts or anything.

ohemorange · 2018-12-13T20:35:36Z

@g6123, @zopieux do you think you'll work on this more? If not we might take it over so it gets in.

g6123 · 2018-12-19T04:36:59Z

@ohemorange okay, I guess I'm not. You can take it over.

zopieux · 2018-12-19T23:56:14Z

Sorry if I let you folks under the wrong impression but it wasn't my intention to contribute on this. Hope this can be fixed someday though!

schoen · 2019-01-30T22:30:17Z

I've made a new PR #6725. It continues to use @g6123's work so that will show up in the git history (thanks very much, @g6123).

Use UTF-8 encoding for nginx plugin

e745059

g6123 mentioned this pull request Dec 20, 2017

Nginx plugin crashes when non-ascii configuration file read #5337

Closed

SwartzCr requested a review from bmw December 20, 2017 21:11

pde assigned ohemorange and bmw and unassigned ohemorange Jan 16, 2018

bmw mentioned this pull request Mar 2, 2018

UnicodeDecodeError: 'utf-8' codec can't decode #5646

Closed

ohemorange requested review from ohemorange and removed request for bmw March 26, 2018 23:18

ohemorange assigned ohemorange and unassigned bmw Mar 26, 2018

ohemorange added this to the 0.24.0 milestone Mar 26, 2018

ohemorange added area: code health area: encoding/type error labels Mar 26, 2018

bmw modified the milestones: 0.24.0, 0.25.0 May 2, 2018

bmw modified the milestones: 0.25.0, 0.26.0 Jun 6, 2018

bmw modified the milestones: 0.26.0, 0.27.0 Jul 12, 2018

ohemorange added the current sprint label Sep 12, 2018

schoen assigned schoen and unassigned ohemorange Sep 12, 2018

schoen suggested changes Sep 12, 2018

View reviewed changes

bmw modified the milestones: 0.27.0, 0.28.0 Sep 13, 2018

ohemorange mentioned this pull request Sep 18, 2018

unable to run certbot --nginx - UnicodeDecodeError: 'ascii' #5236

Closed

bmw removed the current sprint label Oct 17, 2018

g6123 added 2 commits October 31, 2018 16:47

Use io module instead of codecs

71bc3e0

See https://mail.python.org/pipermail/python-list/2015-March/687124.html

Added test for valid/invalid unicode characters

b50cddc

bmw modified the milestones: 0.28.0, 0.29.0 Nov 7, 2018

Merge branch 'master' into master

7f7829b

bmw modified the milestones: 0.29.0, 0.30.0 Dec 11, 2018

bmw added the current sprint label Jan 2, 2019

joohoi self-assigned this Jan 2, 2019

bmw modified the milestones: 0.30.0, 0.31.0 Jan 9, 2019

schoen mentioned this pull request Jan 30, 2019

Use UTF-8 encoding for nginx plugin #6725

Closed

schoen closed this Jan 30, 2019

g6123 mentioned this pull request Feb 23, 2020

Use UTF-8 encoding for nginx plugin #7797

Merged

2 tasks

Uh oh!

Conversation

g6123 commented Dec 20, 2017

Uh oh!

greencopper commented Dec 27, 2017

Uh oh!

g6123 commented Jan 17, 2018

Uh oh!

Lillian-Violet commented Mar 7, 2018

Uh oh!

bmw commented Mar 26, 2018

Uh oh!

deprito commented Jun 10, 2018

Uh oh!

schoen commented Sep 12, 2018

Uh oh!

schoen commented Sep 12, 2018

Uh oh!

schoen left a comment

Choose a reason for hiding this comment

Uh oh!

ohemorange commented Sep 13, 2018

Uh oh!

schoen commented Sep 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

schoen commented Sep 18, 2018

Uh oh!

zopieux commented Sep 18, 2018

Uh oh!

ohemorange commented Oct 10, 2018

Uh oh!

g6123 commented Oct 31, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ohemorange commented Nov 8, 2018

Uh oh!

ohemorange commented Dec 13, 2018

Uh oh!

g6123 commented Dec 19, 2018

Uh oh!

zopieux commented Dec 19, 2018

Uh oh!

schoen commented Jan 30, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

schoen commented Sep 14, 2018 •

edited

Loading

g6123 commented Oct 31, 2018 •

edited

Loading