feat: cryptoki via UNIX socket#3421
Conversation
307d71e to
0bbbb1c
Compare
0bbbb1c to
0cdce00
Compare
0cdce00 to
95b71f4
Compare
95b71f4 to
064a7fe
Compare
Robot Results
|
1fa67be to
dcb1738
Compare
dcb1738 to
3665bb1
Compare
3665bb1 to
e8fef0c
Compare
f4ec0b5 to
3373c30
Compare
3373c30 to
4326f00
Compare
4326f00 to
b3e70ff
Compare
10339b2 to
066fd35
Compare
066fd35 to
b3d5da6
Compare
|
@Bravo555 As a follow-up it would be good to update to cryptoki 0.9.0 |
reubenmiller
left a comment
There was a problem hiding this comment.
A few points to discuss:
- Does the tedge-p11-server support reading from a tedge.toml config file (it doesn't look like it)? Previously this was the case, and it could load the pkcs11 module path from it, and the pin, now the user has to provide it via arguments which will make this harder to package as a feature. So at the moment we have configuration in tedge.toml but nothing is reading it.
- Who is responsible for providing the pin? The client or the server?
- The cryptoki feature is not enabled by default yet, we can enable this now since the gnu build dependency for tedge has been removed.
|
During testing I also found an unhandled panic (the exact root cause is also unclear): Unhandled panic during tedge reconnect$ tedge reconnect c8y
Disconnecting from Cumulocity
Removing bridge config file... ✓
Bridge doesn't exist. Device is already disconnected from Cumulocity.
Reconnecting with config:
device id: rmi_exampleA0001
cloud profile: <none>
cloud host: thin-edge-io.eu-latest.cumulocity.com:8883
auth method: certificate
certificate file: /opt/homebrew/etc/tedge/device-certs/tedge-certificate.pem
bridge: built-in
service manager: homebrew
mosquitto version: 2.0.21
thread 'main' panicked at crates/extensions/tedge-p11-server/src/signer.rs:48:54:
called `Result::unwrap()` on an `Err` value: Hit the end of buffer, expected more data
note: run with `RUST_BACKTRACE=1` environment variable to display a backtraceOn the tedge-p11-server side, the following ERROR log entries are visible. |
didier-wenzek
left a comment
There was a problem hiding this comment.
The design is in a good shape.
| rustls-native-certs = { workspace = true } | ||
| rustls-pemfile = { workspace = true } | ||
| sha-1 = { workspace = true } | ||
| tedge-p11-server = { workspace = true } |
There was a problem hiding this comment.
The direct dependency on cryptoki can now be removed as being unused. Nice as this is the main goal of this PR!
I even wonder if we still need the cryptoki feature.
There was a problem hiding this comment.
Yeah, the feature flag can removed unless we find other unexpected blockers.
There was a problem hiding this comment.
Yeah the cryptoki feature can definitely be removed now, however we just spoke and there might still be value in keeping the direct loading of the cryptoki library in cases where tedge is compiled with gnu libc (this is the case in Yocto)...but the logic was more "we already have the logic there, so why not just leave it"...so if we run into the slightest of problems due to that decision, then we can remove it.
There was a problem hiding this comment.
The cryptoki feature has been removed but not the dependency (which is actually unused).
| @@ -41,34 +41,29 @@ pub fn create_tls_config_cryptoki( | |||
| client_certificate: impl AsRef<Path>, | |||
| cryptoki_config: CryptokiConfig, | |||
There was a problem hiding this comment.
Can this actually be something else than CryptokiConfig::SocketService ?
There was a problem hiding this comment.
The idea is to still allow using a cryptoki module directly (i.e. without spinning up and connecting to a signing server) if running on a supported target (GNU libc) to simplify deployment. tedge_config will be changed to make it more clear:
device.cryptoki.mode(will rename becauseenableimplies a boolean)off- don't use cryptokimodule- use cryptoki module directly, requiresdevice.cryptoki.module_pathto be setsocket/service/tedge-p11- use a signing service, requiresdevice.cryptoki.socket_pathto be set
There was a problem hiding this comment.
I guess you mean socket/service/tedge-p11 => socket or tedge-p11-server
| pub struct Frame { | ||
| /// A version tag for possible future versions, currently ignored. | ||
| // although i'm not sure if it's actually necessary now because we can always add it later if need be; it's because | ||
| // we know that 1) we're having one connection/one call, so we read until EOF and know we have a valid message (so | ||
| // don't have use a scheme like TLV) and 2) we can always add new fields | ||
| pub version: Version, | ||
| pub payload: Payload, | ||
| } |
There was a problem hiding this comment.
As currently implemented the version tag is indeed not so useful (the payload being independent of the version).
I would rather use somethings along these lines, so the version check is done by serde and let server and clients handle older versions.
| pub struct Frame { | |
| /// A version tag for possible future versions, currently ignored. | |
| // although i'm not sure if it's actually necessary now because we can always add it later if need be; it's because | |
| // we know that 1) we're having one connection/one call, so we read until EOF and know we have a valid message (so | |
| // don't have use a scheme like TLV) and 2) we can always add new fields | |
| pub version: Version, | |
| pub payload: Payload, | |
| } | |
| pub enum Frame { | |
| Version1 { pub payload: Payload }, | |
| Version2 { pub payload: Payload, pub extra: PayloadExtra }, | |
| } |
There was a problem hiding this comment.
Changed in 2f30c39.
Enum is indeed more suitable, because indeed it makes more sense for the type system to enforce that only (VersionA, PayloadA), (VersionB, PayloadB) are valid.
However I didn't include the version 2 variant because I don't know what the PayloadExtra should be. A map, perhaps, but of what?
Honestly, all this versioning makes my head spin where I kind of have a mental block because I have no idea if the things are doing will be backwards compatible or not 🙁. Maybe we should have a call about that, but I'm not sure if publishing the v1 with the guarantee that new clients will be able to work with this v1 server.
There was a problem hiding this comment.
However I didn't include the version 2 variant because I don't know what the PayloadExtra should be. A map, perhaps, but of what?
Yeah, it was just an example. As of now only Version1 makes sense.
I'm not sure if publishing the v1 with the guarantee that new clients will be able to work with this v1 server.
Encoding version in the message frames is indeed not enough. For that to work:
- A server that running version N+1 must still be able to accept and respond to requests version N and below.
- A server should be able to return an error to the client, notably to tell a request cannot be serialized.
- A client running version N+1 should be able to downgrade its request to the version actually used by the server (determined by the version of the error frame returned from the first error).
With the exception of the 2nd point (A frame aimed to return an error to the client), nothing specific has to be implemented in this PR.
Also, the |
| pub enum Frame1 { | ||
| ChooseSchemeRequest(ChooseSchemeRequest), |
There was a problem hiding this comment.
Adding an error case, is required to able to handle version in the future. See #3421 (comment)
| pub enum Frame1 { | |
| ChooseSchemeRequest(ChooseSchemeRequest), | |
| pub enum Frame1 { | |
| ProtocolError, | |
| ChooseSchemeRequest(ChooseSchemeRequest), |
There was a problem hiding this comment.
Fixed in 1a3990a. I've also taken a liberty to add an error message string.
I've also sorted out a bit the idea of versioning in my head and have a bit better understanding.
Basically, because the version tag is the first thing we send over after the connection is established, we can read only the first byte and decide everything from there. Currently with version 1, after reading the version byte, we expect:
- Client to send over a
Frame1value that is a request. - Client to close its sending half.
- At this point server reads the request until EOF, closes reading half, sends a response which is a single
Frame1that is not a request (can be an error) and closes sending half. - Client should read the
Frame1response until EOF and close reading half. - At this point the connection is terminated on both ends.
But in the future, using a hypothetical version 2, we can in theory move away from that model and send multiple requests and responses over a single connection.
- old clients will keep working with new server, maintaining forwards compatibility (but this will probably be less common case)
- new clients, to work with older server, will need some version negotiation mechanism to know what's the server version to choose appropriate message to send, which achieves backwards compatibility. This can be, as you suggested, an error returned by the server when client uses a new request that server can't support, at which point client can open a new connection and use the correct version, or a server can send over its own version first, which the client can read and decide what type of request to send. Due to using a UNIX socket, bandwidth and latency are not a concern, so the solution should be chosen based on simplicity.
There was a problem hiding this comment.
Fixed in 1a3990a. I've also taken a liberty to add an error message string.
Perfect!
I've also sorted out a bit the idea of versioning in my head and have a bit better understanding. Basically, because the version tag is the first thing we send over after the connection is established, we can read only the first byte and decide everything from there. Currently with version 1, after reading the version byte, we expect:
I would simply let serde figure out which version is the received frame. Responding with the same version or returning a protocol error if the frame cannot be deserialized - forcing the peer to downgrade its protocol.
But in the future, using a hypothetical version 2, we can in theory move away from that model and send multiple requests and responses over a single connection.
Possibly.
For rustls we have `default-features = false` because we don't want to use aws-lc-rs, but it also disabled other default features: `std` and `tls12`. These features have been reenabled. Feature determination scheme appears to be wonky as when I started working on p11 server, rustls screamed at me about `ClientConfig::builder` not existing. Indeed it requires `rustls/std` feature, but it worked fine before. Signed-off-by: Marcel Guzik <marcel.guzik@cumulocity.com>
didier-wenzek
left a comment
There was a problem hiding this comment.
I'm okay to merge this PR once approved by @reubenmiller . The code is neat and clear. I have not checked the integration with systemd sockets, though, my focus being on the socket protocol.
| rustls-native-certs = { workspace = true } | ||
| rustls-pemfile = { workspace = true } | ||
| sha-1 = { workspace = true } | ||
| tedge-p11-server = { workspace = true } |
There was a problem hiding this comment.
The cryptoki feature has been removed but not the dependency (which is actually unused).
| match self.mode { | ||
| Cryptoki::Off => Ok(None), | ||
| Cryptoki::Module => Ok(Some(CryptokiConfig::Direct(CryptokiConfigDirect { | ||
| module_path: self | ||
| .module_path | ||
| .or_config_not_set() | ||
| .context("required because `device.cryptoki.mode` is set to `module`")? | ||
| .clone(), | ||
| pin: self.pin.clone(), | ||
| serial: self.serial.or_none().cloned(), | ||
| }))), | ||
| Cryptoki::Socket => Ok(Some(CryptokiConfig::SocketService { | ||
| socket_path: self | ||
| .socket_path | ||
| .or_config_not_set() | ||
| .context("required because `device.cryptoki.mode` is set to `socket`")? | ||
| .clone(), | ||
| })), |
There was a problem hiding this comment.
Nicely reduce the risk of misconfiguration highlighted here #3421 (comment)
reubenmiller
left a comment
There was a problem hiding this comment.
Approved.
I was able to test both the socket and module modes, and was able to successfully connect to Cumulocity with the built-in bridge and using a Yubikey (when starting the tedge-p11-server on the command line). The systemd integration has not been tested, however I'm ok with merging it as is.
Despite modifying certificate crate, this commit doesn't introduce any functional changes there (only import paths are updated). Signed-off-by: Marcel Guzik <marcel.guzik@cumulocity.com>
Instead of always binding the socket, check if any sockets are passed by systemd using sd_listen_fds. If there are, use the first one. If not, fallback to creating a socket manually. The service is started by systemd's socket, but otherwise operates normally, i.e. runs the accept() loop handling multiple requests. This corresponds to `Accept=no` systemd socket configuration. Support for `Accept=yes` will be added in the next commit. A new dependency sd-listen-fds was added, that only contains the required function, but it can be replaced with libsystemd in the future. Signed-off-by: Marcel Guzik <marcel.guzik@cumulocity.com>
This commit adds the `device.cryptoki.socket_path` setting and, if present, connects to the signing service socket under this path instead of loading cryptoki dynamic module directly. Signed-off-by: Marcel Guzik <marcel.guzik@cumulocity.com>
Signed-off-by: Marcel Guzik <marcel.guzik@cumulocity.com>
If can be `off`, `socket` or `module` to disable cryptoki or select a mode in which it's used: using cryptoki module directly or tedge-p11-server signing service. Signed-off-by: Marcel Guzik <marcel.guzik@cumulocity.com>
Proposed changes
This PR adds the
tedge-p11-servercomponent, that unlike other services is built separately and runs directly on the host to allow other thin-edge services access to PKCS#11 cryptographic tokens. The server listens on a UNIX domain socket and exposes two kinds of procedures that is passes for execution to the PKCS#11 cryptographic token:The crate also exports a client that performs
rustls' choosing scheme and signing operations by connecting via the UNIX socket and requesting them from the server, which interacts with the cryptoki device directly.Configuration changes
Replaced
device.cryptoki.enablewithdevice.cryptoki.modewhich can be one ofoff/module/socketoff- cryptoki is disabled and private key is read from a filemodule- thin-edge uses a cryptoki dynamic module directly, which is preferable in environments where that's supported, i.e. dynamic module can be loaded and thin-edge process has access to the cryptographic token (e.g. running directly on the host and using GNU libc)socketConnecting totedge-p11-servervia a UNIX socket. This is preferable when not using GNU libc or when no access to cryptographic token due to running in an unprivileged containerSetting to
moduleimplies requireddevice.cryptoki.module_path. Similarly, setting tosocketimplies required the newdevice/cryptoki.socket_pathsetting, which contains the path to thetedge-p11-serverUNIX socket.device.cryptoki.serialsetting was removed, because it's currently unused. Once related functionality is implemented, it will be reintroduced.Types of changes
Paste Link to the issue
#3363
Checklist
cargo fmtas mentioned in CODING_GUIDELINEScargo clippyas mentioned in CODING_GUIDELINESTODO
Connectionstruct like in tokio framing guideFollow-up
pkcs11:model=PKCS%%2315%%20emulated;manufacturer=piv_II;)At the moment, errors are wrapped multiple times which results in duplicated info, for instance when the key pairs are mismatched (e.g. certificate does not match the private key), then the following error message shows the same error message repeated multiple times:
Further comments