Skip to content

Use UTF-8 encoding for nginx plugin#5341

Closed
g6123 wants to merge 4 commits intocertbot:masterfrom
g6123:master
Closed

Use UTF-8 encoding for nginx plugin#5341
g6123 wants to merge 4 commits intocertbot:masterfrom
g6123:master

Conversation

@g6123
Copy link
Copy Markdown
Contributor

@g6123 g6123 commented Dec 20, 2017

Updated nginx plugin parser to always use UTF-8 encoding with Python (3.x, 2.x) standard library codecs.

This fixes the issue #5337.

@greencopper
Copy link
Copy Markdown

Is there a manual workaround? I have tried keeping the Nginx config file as small as possible, but still get the error on Arch Linux.

@pde pde assigned ohemorange and bmw and unassigned ohemorange Jan 16, 2018
@g6123
Copy link
Copy Markdown
Contributor Author

g6123 commented Jan 17, 2018

@greencopper It is not really clean way, though you can manually apply the changes to the package. The path would be /usr/lib/python3.6/site-packages/certbot_nginx/parser.py or so.

@Lillian-Violet
Copy link
Copy Markdown

I was running into this problem for weeks, got the workaround thrown at me to find unicode characters in files. Used the command grep -r -P '[^\x00-\x7f]' /etc/nginx /etc/letsencrypt to see where the unicode characters might be, turns out theres a unicode character in the mime-types file (³ to be exact). So remove the line with that character on (by finding it with the above command, don't remember what it was exactly) and you should be golden.

@bmw
Copy link
Copy Markdown
Member

bmw commented Mar 26, 2018

@ohemorange, would you like to review this instead?

@ohemorange ohemorange requested review from ohemorange and removed request for bmw March 26, 2018 23:18
@ohemorange ohemorange assigned ohemorange and unassigned bmw Mar 26, 2018
@ohemorange ohemorange added this to the 0.24.0 milestone Mar 26, 2018
@bmw bmw modified the milestones: 0.24.0, 0.25.0 May 2, 2018
@bmw bmw modified the milestones: 0.25.0, 0.26.0 Jun 6, 2018
@deprito
Copy link
Copy Markdown

deprito commented Jun 10, 2018

Just installed via instruction

But, get the same error...

2018-06-10 21:03:05,538:DEBUG:certbot.error_handler:Calling registered functions
2018-06-10 21:03:05,538:INFO:certbot.auth_handler:Cleaning up challenges
2018-06-10 21:03:06,688:DEBUG:certbot.log:Exiting abnormally:
Traceback (most recent call last):
File "/opt/eff.org/certbot/venv/bin/letsencrypt", line 11, in
sys.exit(main())
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot/main.py", line 1323, in main
return config.func(config, plugins)
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot/main.py", line 1086, in run
certname, lineage)
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot/main.py", line 120, in _get_and_save_cert
lineage = le_client.obtain_and_enroll_certificate(domains, certname)
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot/client.py", line 383, in obtain_and_enroll_certificate
cert, chain, key, _ = self.obtain_certificate(domains)
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot/client.py", line 326, in obtain_certificate
orderr = self._get_order_and_authorizations(csr.data, self.config.allow_subset_of_names)
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot/client.py", line 362, in _get_order_and_authorizations
authzr = self.auth_handler.handle_authorizations(orderr, best_effort)
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot/auth_handler.py", line 75, in handle_authorizations
resp = self._solve_challenges(aauthzrs)
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot/auth_handler.py", line 126, in _solve_challenges
resp = self.auth.perform(all_achalls)
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot_nginx/configurator.py", line 1045, in perform
http_response = http_doer.perform()
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot_nginx/http_01.py", line 62, in perform
self.configurator.save("HTTP Challenge", True)
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot_nginx/configurator.py", line 975, in save
self.parser.filedump(ext='')
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot_nginx/parser.py", line 243, in filedump
out = nginxparser.dumps(tree)
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot_nginx/nginxparser.py", line 134, in dumps
return str(RawNginxDumper(blocks.spaced))
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot_nginx/nginxparser.py", line 98, in str
return ''.join(self)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 34: ordinal not in range(128)
2018-06-10 21:03:06,690:ERROR:certbot.log:An unexpected error occurred:

=========

How to upgrade the certbot to 0.26.0?

@bmw bmw modified the milestones: 0.26.0, 0.27.0 Jul 12, 2018
@schoen
Copy link
Copy Markdown
Contributor

schoen commented Sep 12, 2018

I'm a bit nervous about the possibility that some people's nginx files might also contain binary data or textual data in an encoding other than UTF-8, in which case the Python UTF-8 codec might still crash when trying to read these files.

For example, using Markus Kuhn's stress test file at

https://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt

if we say

>>> import codecs
>>> f = codecs.open("UTF-8-test.txt", "r", "utf-8")`
>>> x = f.read()

we will get a crash because, indeed, the file isn't valid UTF-8.

We might want to say that nginx configuration files must be valid UTF-8 in order for Certbot to parse them (which seems like a perfectly valid kind of requirement), but in that case we'd probably still want to give a more useful error in this context, maybe like telling the user nginx configuration file {} is not a valid UTF-8 text file and couldn't be parsed or something?

I could imagine that there might be people using other character sets in comments. For instance, suppose a Chinese speaker uses BIG5 encoding in comments. In that case the line

# 共

would be encoded with bytes [35, 32, 166, 64, 10]. Unfortunately this is not only not UTF-8 but also has no interpretation as UTF-8.

>>> (b"# \xa6\x40\n").decode("UTF-8")

ends up giving

UnicodeDecodeError: 'utf8' codec can't decode byte 0xa6 in position 2: invalid start byte

😢

@schoen schoen assigned schoen and unassigned ohemorange Sep 12, 2018
@schoen
Copy link
Copy Markdown
Contributor

schoen commented Sep 12, 2018

@g6123, thank you for doing this work. Do you think you could revise this to catch errors when the file contains non-UTF-8 data? I think we can assume that we can reject such files (rather than adding further complexity about detection of encodings) but we will probably want to give a useful error message to the user in such cases.

I will also ask my colleagues to confirm that they agree with this path.

Copy link
Copy Markdown
Contributor

@schoen schoen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll still need to explicitly catch errors when the files aren't valid as UTF-8, even though perhaps we don't need to proceed in such cases.

@ohemorange
Copy link
Copy Markdown
Contributor

@schoen does nginx itself crash on that test file? we just want to match nginx's behavior, not catch everything

@bmw bmw modified the milestones: 0.27.0, 0.28.0 Sep 13, 2018
@schoen
Copy link
Copy Markdown
Contributor

schoen commented Sep 14, 2018

@ohemorange, sad to say, I've just created an nginx configuration file with a comment containing a Chinese character in BIG5 encoding instead of UTF-8, and nginx happily parsed and accepted this file. But Python3 refuses to read the content of that same file into a UTF-8 string and instead crashes with a UnicodeDecodeError exception.

So I think we do need to deal with this case after all, or else at least some users may continue to get Unicode-related errors even with the improvement made by this PR!

(I would still be OK with having Certbot refuse to process web server configuration files that aren't valid UTF-8, but I think it needs to give a useful error in this situation explaining what happened.)

@schoen
Copy link
Copy Markdown
Contributor

schoen commented Sep 18, 2018

So, we've concluded that we don't necessarily need to match nginx's behavior but we should at least give a friendly error message rather than an error traceback when a non-UTF-8-valid string is encountered in a configuration file. After all, nginx does accept these strings at least inside comments (and maybe they can also be served in web headers or something).

The error message would explain that Certbot is only able to modify nginx configuration files that are valid UTF-8, and indicate which particular file failed to be parsed.

Maybe we could also open a wishlist issue to ask for more closely matching nginx's behavior in this regard (especially if nginx turns out to accept other string encodings outside of comment lines).

@g6123, are you still interested in working on this issue?

@zopieux
Copy link
Copy Markdown

zopieux commented Sep 18, 2018

Fully agree with @schoen analysis. We cannot reasonably create a fully compatible nginx conf parser+writer in Python, this would be too much work to handle all the corner cases of the original nginx parser in C. A pragmatic fallback though, is to fail gracefully with a useful explanation for the user. Having a line number would be a neat addition.

@ohemorange
Copy link
Copy Markdown
Contributor

Hey @g6123, are you no longer able to work on this?

@zopieux, do you have any objections to taking it over?

@g6123
Copy link
Copy Markdown
Contributor Author

g6123 commented Oct 31, 2018

I also agree with @schoen.

@ohemorange I'm working on it, but it would take some time, since I'm not actively developing Python programs. I've added commits that display warning message and return empty result, with some naive tests. If anyone is afford to take this task over, that's perfectly okay.

@bmw bmw modified the milestones: 0.28.0, 0.29.0 Nov 7, 2018
@ohemorange
Copy link
Copy Markdown
Contributor

Ok, be sure to let us know if there's anything we could do to help move it along, including questions or thoughts or anything.

@bmw bmw modified the milestones: 0.29.0, 0.30.0 Dec 11, 2018
@ohemorange
Copy link
Copy Markdown
Contributor

@g6123, @zopieux do you think you'll work on this more? If not we might take it over so it gets in.

@g6123
Copy link
Copy Markdown
Contributor Author

g6123 commented Dec 19, 2018

@ohemorange okay, I guess I'm not. You can take it over.

@zopieux
Copy link
Copy Markdown

zopieux commented Dec 19, 2018

Sorry if I let you folks under the wrong impression but it wasn't my intention to contribute on this. Hope this can be fixed someday though!

@joohoi joohoi self-assigned this Jan 2, 2019
@bmw bmw modified the milestones: 0.30.0, 0.31.0 Jan 9, 2019
@schoen
Copy link
Copy Markdown
Contributor

schoen commented Jan 30, 2019

I've made a new PR #6725. It continues to use @g6123's work so that will show up in the git history (thanks very much, @g6123).

@schoen schoen closed this Jan 30, 2019
@g6123 g6123 mentioned this pull request Feb 23, 2020
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants