dns-rfc2136: find the correct zone/name when CNAME/DNAMEs are used#7244
dns-rfc2136: find the correct zone/name when CNAME/DNAMEs are used#7244hpax wants to merge 9 commits intocertbot:mainfrom
Conversation
Dynamic zones have significant problems with DNSSEC and with redundant servers (which, of course is highly desirable for DNS.) The obvious solution to that is to use a CNAME or DNAME record to point the _acme-challenge to a different zone which can have different NS and TTL properties. In particular, breaking DNSSEC support breaks exactly the chain of trust on which ACME depends, and is thus extremely undesirable. In order to find the correct base zone and name-in-zone when CNAME/DNAMEs might be present, search from the top down instead of the bottom up, and allow non-authoritative answers for anything other than the final SOA. There is no guarantee that the authentication server is authoritative for anything but the zone into which the TXT record is to be placed. If the authentication server disallows recursion, this code will this do the right thing as long as the server is authoritative for the dynamic zone and any zone which contains a CNAME or DNAME record. If that is not the case, then the server must support recursion for its dynamic clients; it obviously does not need to offer that service to the general public. If even this turns out to be unacceptable, then the solution would be to query the normal nameservers (using the system resolver), at least if an !AA !RA response is returned. The dns.resolver module has a zone_for_name() function, but unfortunately it does not return the name-in-zone, and to me its algorithm appears to be incorrect (at least for our purposes) in a way that is similar to the previous dns-rfc2136 code. This patch changes several levels of the interface to use dns.name.Name objects instead of strings, and passes dns.rdata.Rdata objects between _query_soa() and _find_domain(). This turns out to significantly simplify a fair number of things, but requires a fair number of changes to the test suite. Clean up the test suite by implementing a mock resolver with a mapping instead of a simple sequence of return values, and by precomputing dns.name.Name objects for (sub)domains and prefixes used. Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
Fix for issue #7243. |
|
@adferrand, are you interested in reviewing this when you magically find spare cycles? @joohoi, can probably help you with testing if you want. |
|
Hello @hpax! Your PR is extremely promising. Before we start the review itself, I would like to talk about few points. First could you clean the errors detected by pylint and mypy in the CI build ? It seems to be several parts of unused code, and it may even fix right now the coverage drop we see with your PR (which makes the coverage job fail). Second, the mechanism you propose is very powerful. I am asking myself if it even covers the specific situation described here #6566, with a PR to solve it here #5350, over which we took ownership with #6644. Because if your PR is also covering the case of automatically resolving a CNAME entry of Third, again, your mechanism is very powerful ^^ It allows basically to always resolve the correct zone to use to insert the TXT entry. It seems to not being tied to any specific functionality of |
|
The Certbot team just talked about this PR a bit in our weekly standup. I'll echo what Adrien said and say that we are very excited about the potential of this PR! Two questions that came out of that conversation that we don't immediately know the answer to are:
One scenario we thought about for question (2) that may be a problem was split-horizon DNS. |
|
[Answering two threads]
On 7/17/19 1:10 AM, Adrien Ferrand wrote:
Hello @hpax <https://github.com/hpax>! Your PR is extremely promising.
Before we start the review itself, I would like to talk about few points.
First could you clean the errors detected by pylint and mypy in the CI
build ? It seems to be several parts of unused code, and it may be even
fix by itself the coverage drop we see with your PR (and which makes the
coverage job fail).
My only problem is that I'm currently buried very very deep. I posted
this because I had to do the most trivial possible forward port after my
own system broke from an update.
It would have been far better if I had gotten a positive response the
first time I submitted it.
Second, the mechanism you propose is very powerful. I am asking myself
if it even covers the specific situation described here #6566
<#6566>, with a PR to solve it
here #5350 <#5350>, over which we
took ownership with #6644
<#6644>.
It ought to. I really isn't any difference than individual host CNAME
records.
Because if your PR is
also covering the case of automatically resolving a CNAME entry of
|_acme-challenge.domain.net| pointing to an external zone entry, we
could just drop the other PR and continue with yours.
This is in fact exactly my own use case.
Third, again, your mechanism is very powerful ^^ It allows basically to
always resolve the correct zone to use to insert the TXT entry. It seems
to not being tied to any specific functionality of |dns-rfc2136| plugin.
How it would be complex/doable to generalize the approach for any
Certbot DNS plugin ? If for instance |domain.net| zone is hosted by
Cloudflare, but |subdomain.domain.net| is delegated to DigitalOcean,
this would allow |certbot certonly --dns-digitalocean -d
test.subdomain.domain.net| work correctly, by calling DigitalOcean API
for |subdomain.domain.net|, not |domain.net|.
It is sort-of-true. Currently it relies on the RFC 2136 DNS resolver to
either allow recursion (for clients allowed to push) or it falls back on
(fairly strong) heuristics.
In order to make it generic, it would need to either walk authoritative
servers all the way from the root (which would break behind some
firewalls) or it would need to do these queries via the system-provided
caching resolvers. The best way to do that would probably be to use
dns.resolver.query() for all the searching instead of piggy backing on
the RFC 2136 code.
One major reason I didn't mess with the generic code is that I don't
know how the various DNS providers do their thing; for example, if the
zone boundaries exposed to the Internet actually match what they
internally use. I would not have either the facilities nor bandwidth to
address that probem.
However, at least for the subcase of a CNAME/DNAME pointing into one of
those zones one would expect that it would Just Work[TM].
The Certbot team just talked about this PR a bit in our weekly standup. I'll echo what Adrien said and say that we are very excited about the potential of this PR!
Two questions that came out of that conversation that we don't immediately know the answer to are:
Are there any situations where landing this PR would break a currently working setup?
I do not believe so, even in the case of an internal nameserver, as it
would still encounter the top-level SOA.
Should this code definitively find the correct zone/name in all situations or is it using heuristics to make the best approximation? In other words, in what scenarios would this automatic resolution be insufficient and you may still need a PR like #6644 to manually tell Certbot which domain needs to be modified?
One scenario we thought about for question (2) that may be a problem was split-horizon DNS.
Basically, if the zone name that has to be modified has a different name
than the terminal SOA from a domain walk, there is simply no way to
discovering it automatically. It isn't even certain that the certbot
client can even see the zone in which the _acme-challenge record needs
to be added in that case. I do believe this falls under the category of
"pathological setups."
-hpa
|
|
Oh wow. Finally. Working code for what EFF recommended here. Anyway, tested the version in this PR. Works. It has my blessing. Dig logs: Bind logs |
|
I wanted to add something with regards to pathological cases. I personally believe that the best way to deal with those is to allow the _acme-challenge record to contain a URI record (RFC 7553) which would allow the challenge to be posted to an arbitrary location on an arbitrary server. The "arbitrary location" is important, because there is no guarantee, nor is there a need in this case, for the domain owner to have control over the http root. This is obviously a change to the ACME protocol, as opposed to a localized change in certbot, but I suspect that consider it sooner rather than later is probably a good idea. The advantage of using a URI record is that the URI record can remain static, and the dynamic aspect is completely divorced from DNS (including potential issues with propagation delays, DNSSEC, users who might not be able to create a dynamic zone, and so on) and just requires access to any part of the namespace of a world-facing http server, which could even be special-cased for this purpose and possibly run on a nonstandard port (one use for this could be the certbot standalone webserver run on a nonstandard port.) The existing web server protocol is not adequate, because:
This makes it impossible to execute a challenge for a hostname which has no web server running, e.g. for a dedicated mail server, and in the case of an outsourced web service, only part of the domain name might be accessible to the user. |
|
There was a comment in here which I can't seem to find anymore (deleted?) about having multiple domains CNAME to the same record breaking the letsencrypt server. The ACME spec for dns-01 explicitly say that multiple TXT records for the same name are permitted, and that the server should accept the request if any one of them is correct, so this would be a bug in the server. That being said, this is probably not the best idea. |
|
@hpax what you're suggesting is interesting but IMHO is entirely another method of solving the challenge. What you worked on in this PR would already help a lot of people and has much better chance to be accepted quickly. I have tested your patch on top of 0.36 (0.35.1 for dns-rfc2136) packages from Debian unstable, and rebuilt for Buster, and except for a minor error in the test which I worked around by commenting the |
|
You are quite correct that it is a completely different solution, at the protocol level, and which has totally different properties. I think the Right Thing[TM] is to do this client only solution first. |
There was a problem hiding this comment.
Hello @hpax. Really sorry it took me so long to come back on this PR.
Several discussions have been done to go further with the potential of this PR, but I think that we should integrate your work first, as it is, to let other Certbot users enjoy benefits from it.
I will open a subsequent issue after this PR is merged to see how we can generalize the implementation to the other plugins.
Since I let the PR rot for several months, I prevent you from the hassle of catching up master, and did it myself, fixing the lint/mypy/coverage in the way along.
Technically the code LGTM, and there is a lot of useful comments to allow someone else or a future you to continue the work if needed.
Last thing I would like: could you write a short entry in CHANGELOG.md? Side-effect is that the last commit in this PR will be yours, and so your ownership will be preserved once it is squashed to master.
|
Per our postmortem a few months ago this PR will need a 2nd review before being merged. |
|
I will note here also that two different Certbot users confirmed that this PR is working as expected. |
|
plz merge |
|
@bmw if community reviews does not count, then could a core dev take care of this extra review then please? |
|
Yep. The few people who are paid to work on Certbot (most of whom only work on the project part time) are stretched pretty thin but we'll take a look at this when we can! |
|
@hpax suggestion for |
|
Does this fix need to hang in limbo indefinitely because @hpax is possibly too busy/awol to write something up for |
|
It's waiting on a 2nd review from a core Certbot maintainer and we've had other commitments keeping us busy such as the work funded by OTF. We're hoping to spend the first couple months of next year working through our PR queue though so hopefully this will be reviewed and merged then. Sorry for the delay! |
Does this concern the actual dns-rfc2136 plugin, or specifically this PR? |
Down to the admin, really. As long as they're cognizant of what's necessary when they run this plugin. Is your change request still valid, given that your rebase fixed it? |
On master there is already a first integration test for a straight-forward case. Now this can be enriched in the scope of this PR to cover the recursion model proposed here. |
Which fix are you referring to? In general the opt-in through a flag is still relevant in my opinion, given that if the admin of the DNS server knows exactly what they are doing, users of this server may not. |
Your 'changes requested' on this PR. Which are done...? |
|
Any news? I have to apply this changes by hand after each Certbot release which is annoying. |
|
Same..... |
|
@adferrand what is the status here? I think you can imagine the level of frustration. It really does not feel one can contribute to this project.
I'm fine with a CLI flag but I would have been interested to know about which situations you're talking about. I have no clue what kind of situation would only break when using this plugin while having the user's service working fine. |
|
I wouldn't expect this PR to be merged for at least a few more months. In the next 6 months or so, we're planning on rethinking our approach to DNS plugins in Certbot since the current situation is not maintainable by the small team here. Until then, we're not planning on adding any new plugins or significant features to existing ones. I know this isn't what you wanted to hear and that it'd be easier for you all if this change landed in our official plugin, but for now if you need this feature I recommend maintaining a 3rd party fork of the plugin. You can find examples of other 3rd party plugins as examples to build off of if you want at https://certbot.eff.org/docs/using.html#third-party-plugins. |
|
This once again brings up something I have pointed out in the past: runtime modification of the DNS entries should not be necessary requiring all these plugins. I would once again like to propose a new challenge protocol added to ACME where DNS ownership is indicated by an URL RR at _acme-challenge indicating an https server to which the verification is delegated. That can be done statically, and the rest reverts to http-01 protocol, but does not need an https server on the actual host the name of which needs to be certified, nor does it require control of the /.well-known URL, or a server on port 80 (as opposed to, say, 443, or a custom port for a transient certbot buildin server.) It is really too sad that this has been blocked for this long, given that this is in fact stated explicitly on LetsEncrypt as a supported mechanism: "Since Let’s Encrypt follows the DNS standards when looking up TXT records for DNS-01 validation, you can use CNAME records or NS records to delegate answering the challenge to other DNS zones. This can be used to delegate the _acme-challenge subdomain to a validation-specific server or zone. It can also be used if your DNS provider is slow to update, and you want to delegate to a quicker-updating server." |
|
Can we plz get this to work? Just replace the current whatever of a plugin with @hpax version. Works. |
|
FWIW, for me personally I have long since given up and switched to using external scripts. |
|
Won’t make a difference as it seems but I’m super sad as well that such an important bug can’t be fixed within 5 years (2018 is the first reference) especially that everything is done and ready. The current DNS challenge with either manual updates or requiring to make the whole zone dynamic/updateable is not at all a solution. |
|
Did a bunch of research and for an enterprise with hundreds of "internal only" TLS sites that need to use something other than HTTP-01 updates, this is one of the only ways to accomplish certs that will never be seen outside of the company. Its unfortunate that we had to commit resources to basically pull a 4 year old PR, and merge it with the current certbot just to get this method working. This also means that we have to dedicate someone to constantly merge the code in and regenerate "custom" RPM packages to deploy this code. |
|
In the light of what I did with #9838, I would like to reassess what we should do about this PR. In previous discussions we said that this approach was extremely useful, and could even be generalized to all DNS plugins. There are two things to cover:
Both approaches are covered by dedicated functions in
I said in #9838 that 1) has to be done for Lexicon-based DNS plugins because of the need of Lexicon to know the actual DNS zone name (and 1) is actually done by Lexicon directly), and that 2) should be the responsibility of the ACME client because a CNAME delegation of As a side note, I also does 2) in my own project So I think we should close both #7244 and #6644 in favor of a new approach to use at least cc @bmw |
|
Questions about this PR. I'd like to setup something like: such that the TXT records for each domain can be created in validate.example.com and used for domain validation for a number of zones and sub-zones. "validate" could probably be something like _rfc2136 to make it less likely to conflict with any name desired for use. The code would need to detect a CNAME then follow the hostnames back till it finds a SOA record and add the TXT record there with the remaining names. ie: dev.example.net in the zone validate.example.com and then remember and delete that entry after validation. From what I read of the spec, I think these entries will validate ok, it's just a matter of certbot understanding what's needed and doing it based on the pre-existing _acme-challenge CNAME. Will this PR, or something like it, show up in certbot anytime soon? I'm on Ubuntu running:
|
|
I pulled down this PR, patched it for current, and tested things out. updated the PR in my fork. timriker:timriker/rfc-2136-cname-fix Works as intended. Please merge. |
|
I don't know anything to add wrt the more generalized approach for dns plugins (I guess it could make sense?) but I do want to add that I've just used the code from this PR and it did fix the issue I was having with a CNAME that intended to point certbot to a sub-zone for the dynamic records. |
|
Also disappointing that this code has not been merged yet. I don't really understand the hesitation. If the concern is that this is somehow too risky, perhaps you'd be willing to accept a change request that would at least provide a visible error when a CNAME is detected when looking for the SOA? |
|
For those looking for a solution, the nsupdate.sh script used with the CNAME and DNAME redirects are followed as per RFC 1034 and RFC 6672. The authenticator.sh and cleanup.sh hook scripts can be found here. |
|
Merged updates from master on my fork with HPA's updates. Still working. Please merge. timriker:timriker/rfc-2136-cname-fix Should I open this as a separate PR? @hpax are you still using this? Are you planning to merge updates on your fork/PR? |
|
Oh, I gave up on it years ago and replaced it with external scripts. |
|
So i ran into the same issue. I configured my domain with a CNAME to redirect it to a special subdomain. And it failed. Then i checked the code, found that basically nothing would respect the CNAME. After a short google search, i ended up here. So this request is now 5 years old (behold, next year we need to send it to School), i was really surprised that it never made it. I just tested it with the branch from @timriker and it just worked as expected. So i wonder why we don't just finish this PR? Whats exactly missing? |
Dynamic zones have significant problems with DNSSEC and with redundant
servers (which, of course is highly desirable for DNS.) The obvious
solution to that is to use a CNAME or DNAME record to point the
_acme-challenge to a different zone which can have different NS and
TTL properties. In particular, breaking DNSSEC support breaks exactly
the chain of trust on which ACME depends, and is thus extremely
undesirable.
In order to find the correct base zone and name-in-zone when
CNAME/DNAMEs might be present, search from the top down instead of the
bottom up, and allow non-authoritative answers for anything other than
the final SOA. There is no guarantee that the authentication server is
authoritative for anything but the zone into which the TXT record is
to be placed.
If the authentication server disallows recursion, this code will this
do the right thing as long as the server is authoritative for the
dynamic zone and any zone which contains a CNAME or DNAME record. If
that is not the case, then the server must support recursion for its
dynamic clients; it obviously does not need to offer that service to
the general public. If even this turns out to be unacceptable, then
the solution would be to query the normal nameservers (using the
system resolver), at least if an !AA !RA response is returned. The
dns.resolver module has a zone_for_name() function, but unfortunately
it does not return the name-in-zone, and to me its algorithm appears
to be incorrect (at least for our purposes) in a way that is similar
to the previous dns-rfc2136 code.
This patch changes several levels of the interface to use
dns.name.Name objects instead of strings, and passes dns.rdata.Rdata
objects between _query_soa() and _find_domain(). This turns out to
significantly simplify a fair number of things, but requires a fair
number of changes to the test suite. Clean up the test suite by
implementing a mock resolver with a mapping instead of a simple
sequence of return values, and by precomputing dns.name.Name objects
for (sub)domains and prefixes used.
Signed-off-by: H. Peter Anvin hpa@zytor.com
Pull Request Checklist
mastersection ofCHANGELOG.mdto include a description ofthe change being made.
annotations
for any functions that were added or modified.
AUTHORS.mdif you like.