[ca] Extricate ExternalCA from SecurityConfig#2262
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2262 +/- ##
==========================================
- Coverage 60.35% 60.27% -0.09%
==========================================
Files 128 128
Lines 25979 25994 +15
==========================================
- Hits 15679 15667 -12
- Misses 8906 8936 +30
+ Partials 1394 1391 -3 |
ca/server.go
Outdated
|
|
||
| // ExternalCA returns the current external CA | ||
| func (s *Server) ExternalCA() *ExternalCA { | ||
| // don't need a lock because this never gets updated |
There was a problem hiding this comment.
It does get updated in UpdateRootCA.
There was a problem hiding this comment.
Ah right sorry, it wasn't in a previous version, and I forgot to update.
ca/server.go
Outdated
| } | ||
|
|
||
| // filterAndSaveExternalCAURLS saves a copy of the external CAs to the server, and also returns a list of external CA | ||
| // urls filtered by the desired cert. This function assumes that something else has called s.secConfugMu.Lock() and |
There was a problem hiding this comment.
I guess it's more like saveAndFilter?
There was a problem hiding this comment.
:D I should probably change that name now to signingMu or something.
2c6e06f to
73c0261
Compare
ca/server.go
Outdated
| if rootCAChanged || externalCAChanged { | ||
| // we want to update only if the external CA URLS have changed, since if the root CA has changed we already | ||
| // run similar logic | ||
| if !rootCAChanged && externalCAChanged { |
There was a problem hiding this comment.
Could be else if externalCAChanged {, but fine with this if it's more readable to be redundant.
ca/server.go
Outdated
| // will also unlock it | ||
| func (s *Server) saveAndFilterExternalCAURLS(ctx context.Context, desiredCert, defaultCert []byte, apiExternalCAs []*api.ExternalCA) (urls []string) { | ||
| desiredCert = NormalizePEMs(desiredCert) | ||
| s.lastSeenExternalCAs = make([]*api.ExternalCA, len(apiExternalCAs)) |
There was a problem hiding this comment.
I think this is missing a , 0
There was a problem hiding this comment.
Oops, yes, sorry, although I actually just left this and instead of doing append, set s.lastSeenExternalCAs[i] = .... Don't feel strongly or anything, just in mind setting the values makes it slightly more clear that we're copying rather than creating something new, but that's just how I tend to read these things.
There was a problem hiding this comment.
I have a slight preference for going back to the old code's s.lastSeenExternalCAs = cluster.Spec.CAConfig.Copy().ExternalCAs expression, moved outside this function.
The reason is that saving the URL list and filtering it are different operations, so I think keeping them separate is more understandable.
ca/server.go
Outdated
| Certificates: s.securityConfig.ClientTLSCreds.Config().Certificates, | ||
| RootCAs: s.externalCAPool, | ||
| MinVersion: tls.VersionTLS12, | ||
| }, urls...) |
There was a problem hiding this comment.
Can this use NewExternalCATLSConfig to avoid hardocding the minimum version?
manager/controlapi/ca_rotation.go
Outdated
| Certificates: securityConfig.ClientTLSCreds.Config().Certificates, | ||
| RootCAs: rootPool, | ||
| MinVersion: tls.VersionTLS12, | ||
| }, urls...).CrossSignRootCA(ctx, newCARootCA) |
There was a problem hiding this comment.
Can this use NewExternalCATLSConfig to avoid hardocding the minimum version?
b94298b to
4a0490f
Compare
ExternalCA from SecurityConfigExternalCA from SecurityConfig
|
I've added an interface to |
ca/server.go
Outdated
|
|
||
| // SecurityConfigTLSUpdater takes a security config and sets a watch so we can tell when TLS credentials have changed | ||
| type SecurityConfigTLSUpdater struct { | ||
| s *SecurityConfig |
There was a problem hiding this comment.
Let's call it securityConfig to avoid confusion with server.
ca/server.go
Outdated
| return s.s.ClientTLSCreds.Config().Certificates | ||
| } | ||
|
|
||
| // NotifyTLSChange starts watching for TLS changes in the security config, and returns a channel through which |
There was a problem hiding this comment.
Comment seems outdated
ca/server.go
Outdated
| } | ||
|
|
||
| // SecurityConfigTLSUpdater takes a security config and sets a watch so we can tell when TLS credentials have changed | ||
| type SecurityConfigTLSUpdater struct { |
There was a problem hiding this comment.
I'm wondering if this abstraction is really useful. It's only used in one place, and it seems like calling Watch would be clearer and lower this risk of leaking the watch (because the cancel return value would be ignored in that case).
Is there an upcoming change that will make this more useful?
If we keep the abstraction, can we return a cancel function instead of having a Stop method?
There was a problem hiding this comment.
I can pass a channel instead to the CA server notifying it when the TLS creds have been updated instead. But the SecurityConfig object already had a watch on it to notify the agent when the credentials or the root CA have changed so the agent session could restart. It seemed easier to bottleneck it through the SecurityConfig, but I could also instead try setting a private watch in the node.go code.
There was a problem hiding this comment.
I was just asking if we need a SecurityConfigTLSUpdater rather than having the single place where it's used call the Watch method directly.
There was a problem hiding this comment.
I was planning on removing the security config entirely from the CA server at a later point, as part of separating out the security config from the CA server, in which case I just needed some kind of interface from which I can get the updated external CA creds.
Perhaps it'd be better just to pass a channel?
There was a problem hiding this comment.
I'd suggest removing the abstraction until we get to that point. Then we can see what makes most sense. Passing a channel may be best, especially if the lifecycle of the watch can be managed by something external to the CA server.
ca/config.go
Outdated
| } | ||
|
|
||
| // Close closes the security config, freeing up the watch queue resource | ||
| func (s *SecurityConfig) Close() { |
There was a problem hiding this comment.
Not sure if it makes sense, but having NewSecurityConfig return a cancel function might make it harder to leak these. On the other hand, security configs might always be stored somewhere long-term, in which case it wouldn't help.
In general this change seems to be going in the right direction (I'm happy to see SetWatch go), but making the SecurityConfig something whose resources need to be consciously deallocated is unfortunate.
There was a problem hiding this comment.
We mainly create the security config in the node/node.go code, and the watch gets cleaned up at the end of run anyway, so holding onto the cancel seems fine? Returning the cancel may make it harder to leak in tests, though, so: #2274
4a0490f to
108a3f1
Compare
|
Coming back to this after awhile. Is it ready for review? |
ca/server.go
Outdated
| lastSeenClusterRootCA *api.RootCA | ||
| lastSeenExternalCAs []*api.ExternalCA | ||
| secConfigMu sync.Mutex | ||
| signingMu sync.Mutex |
There was a problem hiding this comment.
I'm not sure signingMu is the right name. It seems to protect s.externalCA, and some aspects of the securityConfig.
Let's try to articulate what this mutex is used for, and from that we can figure out if the name makes sense.
There was a problem hiding this comment.
I guess it's supposed to protect the components of the CA server used to issue new certificates, so the rootCA object, and everything needed to connect to an external CA (hence the parts of securityConfig, because it has the TLS credentials needed to connect to the external CA)
There was a problem hiding this comment.
Please add this in a comment :)
There was a problem hiding this comment.
Ah ok, cool, I was hoping you might have a better name. :) I'm terrible at naming things, which is why I tend to end up with Java names (like RootCAAndExternalCAMutex or something)
| } | ||
|
|
||
| // ExternalCA returns the current external CA | ||
| func (s *Server) ExternalCA() *ExternalCA { |
There was a problem hiding this comment.
I see this used in a unit test. Is it called anywhere else?
There was a problem hiding this comment.
Nope, just the test. I can make the server test part of the ca package instead of the ca_test package to access it directly, if you prefer.
There was a problem hiding this comment.
Update: oh, right import cycle issues. :| nm, I can't actually do that.
There was a problem hiding this comment.
Please add a comment saying this is exposed to support unit testing and externalCA is nominally a private field.
34f7bc5 to
afac780
Compare
|
@aaronlehmann Also, apologies, I spent a while trying to remember what thought I had regarding this PR and I no longer remember it. :) So yes, this is ready for review. |
afac780 to
2658824
Compare
manager/controlapi/ca_rotation.go
Outdated
| rootPool.AppendCertsFromPEM(apiRootCA.CACert) | ||
| crossSignedCert, err = ca.NewExternalCA(nil, | ||
| ca.NewExternalCATLSConfig(securityConfig.ClientTLSCreds.Config().Certificates, rootPool), | ||
| urls...).CrossSignRootCA(ctx, newCARootCA) |
There was a problem hiding this comment.
Please break up this statement into two.
| if len(clusters) != 1 { | ||
| return errors.New("could not find cluster object") | ||
| } | ||
| s.UpdateRootCA(ctx, clusters[0]) // call once to ensure that the join tokens are always set |
There was a problem hiding this comment.
Making a note to close #2241 when this is merged.
There was a problem hiding this comment.
There is one more place where we have to change this (in manager) before closing that.
| case <-externalTLSCredsChange: | ||
| s.signingMu.Lock() | ||
| s.externalCA.UpdateTLSConfig(NewExternalCATLSConfig( | ||
| s.securityConfig.ClientTLSCreds.Config().Certificates, s.externalCAPool)) |
There was a problem hiding this comment.
It seems like this update to externalCA's TLS config is "eventually consistent", since we might pick up events from the updates channel between the time the credentials change and this is called. Will that cause any problems?
I suppose the use case here is node certificate renewals, so it seems reasonable to expect the old node certificate to keep working for that short interval. Root rotations might be a bit more of a corner case, but I'm not sure that's worth worrying about either.
There was a problem hiding this comment.
Sorry for the delay in response! Yes, it is eventually consistent - during that period, signing nodes may fail, but in a recoverable way, so eventually the correct certs will be issued. I've added a comment to that effect.
|
Overall I'm good with the refactor 👍 |
…ficate, rather than a RootCA object. Signed-off-by: Ying Li <ying.li@docker.com>
…irst step toward separating the root CA object used to sign certificates on the CA server from the SecurityConfig, which is mainly used to maintain client and server TLS credentials. Signed-off-by: Ying Li <ying.li@docker.com>
SecurityConfig's TLS credentials change. Signed-off-by: Ying Li <ying.li@docker.com>
Signed-off-by: Ying Li <ying.li@docker.com>
2658824 to
22b5c13
Compare
This is the first step towards splitting the CA server's signing functionality from
SecurityConfig, which serves more for providing creds for mTLS. Doing so will help unify the root CA update paths innode/node.go.The only node that ever needs the
ExternalCAobject is the leader that is running the CA server anyway, so it makes sense for it to go into the CA server. This also better makes it possible at a future point to provide separate creds for contacting the external CA server if we ever needed to do so.