Cryptech Project - FutureWork

Secure Channel

2017-07-27T00:24:00+00:00

This is a sketch of a design for the secure channel that we want to have between the Cryptech HSM and the client libraries which talk to it. Work in progress, and not implemented yet because a few of the pieces are still missing.

Design goals and constraints

Basic design goals:

End-to-end between client library and HSM.
Not require yet another presentation layer if we can avoid it (so, reuse XDR if possible, unless we have some strong desire to switch to something else).
Provide end-to-end message integrity between client library and HSM.
Provide end-to-end message confidentiality between client library and HSM. We only need this for a few operations, but between PINs and private keys it would be simpler just to provide it all the time than to be selective.
Provide some form of mutual authentication between client library and HSM. This is tricky, since it requires either configuration (of the other party's authenticator) or leap-of-faith. Leap-of-faith is probably good enough for most of what we really care about (insuring that we're talking to the same dog now as we were earlier).

Not 100% certain we need this at all, but if we're going to leave ourselves wide open to monkey-in-the-middle attacks, there's not much point in having a secure channel at all.
Use boring simple crypto that we already have (or almost have) and which runs fast.
Continue to support multiplexer. Taken together with end-to-end message confidentiality, this may mean two layers of headers: an outer set which the multiplexer is allowed to mutate, then an inner set which is protected. Better, though, would be if the multiplexer can work just by reading the outer headers without modifying anything.
Simple enough that we can implement it easily in HSM, PKCS #11 library, and Python library.

Why not TLS?

We could, of course, Just Use TLS. Might end up doing that, if it turns out to be easier, but TLS is a complicated beast, with far more options than we need, and doesn't provide all of what we want, so a fair amount of the effort would be, not wasted exactly, but a giant step sideways. Absent sane alternatives, I'd just suck it up and do this, with a greatly restricted ciphersuite, but I think we have a better option.

Design

Basic design lifted from "Cryptography Engineering: Design Principles and Practical Applications" (ISBN 978-0-470-47424-2, http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470474246.html), tweaked in places to fit tools we have readily available.

Toolkit:

AES
SHA-2
ECDH
ECDSA
XDR

As in the book, there are two layers here: the basic secure channel, moving encrypted-and-authenticated frames back and forth, and a higher level which handles setup, key agreement, and endpoint authentication.

Chapter 7 outlines a simple lower layer using AES-CTR and HMAC-SHA-256. I don't see any particular reason to change any of this, AES-CTR is easy enough. I suppose it might be worth looking into AES-CCM and AES-GCM, but they're somewhat more complicated; section 7.5 ("Alternatives") discusses these briefly, we also know some of the authors.

For key agreement we probably want to use ECDH. We don't quite have that yet, but in theory it's relatively minor work to generalize our existing ECDSA code to cover that too, and, again in theory, it should be possible to generalize our existing ECDSA fast base point multiplier Verilog cores into fast point multiplier cores (sic: limitation of the current cores is that they only compute scalar times the base point, not scalar times an arbitrary point, which is fine for ECDSA but doesn't work for ECDH).

For signature (mutual authentication) we probably want to use ECDSA, again because we have it and it's fast. The more interesting question is the configuration vs leap-of-faith discussion, figuring out under which circumstances we really care about the peer's identity, and figuring out how to store state.

Chapter 14 (key negotiation) of the same book covers the rest of the protocol, substituting ECDH and ECDSA for DH and RSA, respectively. As noted in the text, we could use a shared secret key and a MAC function instead of public key based authentication.

Alternatively, the Station-to-Station protocol described in 4.6.1 of "Guide to Elliptic Curve Cryptography" (ISBN 978-0-387-95273-4, https://link.springer.com/book/10.1007/b97644) appears to do what we want, straight out of the box.

Interaction with multiplexer is slightly interesting. The multiplexer really only cares about one thing: being able to match responses from the HSM to queries sent into the HSM, so that the multiplexer can send the responses back to the right client. At the moment, it does this by seizing control of the client_handle field in the RPC frame, which it can get away with doing because there's no end-to-end integrity check at all (yuck). We could add an outer layer of headers for the multiplexer, but would rather not.

The obvious "real" identity for clients to use would be the public keys (ECDSA in the above discussion) they use to authenticate to the HSM, or a hash (perhaps truncated) thereof. That's good as far as it goes, and may suffice if we can assume that clients always have unique keys, but if client keys are something over which the client has any control (which includes selecting where they're stored, which we may not be able to avoid), we have to consider the possibility of multiple clients using the same key (yuck). So a candidate replacement for the client_handle for multiplexer purposes would be some combination of a public key hash and a process ID, both things the client could provide without the multiplexer needing to do anything.

The one argument in favor of leaving control of this to the multiplexer (rather than the endpoints) is that it would (sort of) protect against one client trying to masquerade as another -- but that's really just another reason why clients should have their own keys to the extent possible.

As a precaution, perhaps the multiplexer should check for duplicate identifiers, then do, um, something? if it finds duplicates. This kind of violates Steinbach's Guideline for Systems Programming ("Never test for an error condition you don't know how to handle"). Obvious answer is to break all connections old and new using the duplicate identity, minor questions about how to reset from that, whether worth doing at all, etc. Maybe clients just shouldn't do that.

Open issues

Does the resulting design pass examination by clueful people?
Does this end up still being significantly simpler than TLS?
The Cryptography Engineering protocols include a hack to work around a length extension weakness in SHA-2 (see section 5.4.2). Do we need this? Would we be better off using SHA-3 instead? The book claims that SHA-3 was expected to fix this, but that was before NIST pissed away their reputation by getting too cosy with the NSA again. Over my head, ask somebody with more clue.

Development of a Cryptech ASIC Implementation

2016-12-15T22:44:00+00:00

Introduction

The aim of the Cryptech project is to develop an open, free, and auditable HSM. The Cryptech HSM includes both SW and HW parts. In at least the first iteration of the Cryptech HSM, the HW parts are implemented using FPGA devices. However, the ability to implement the HW parts in a Cryptech ASIC device in a future iteration is anticipated in the design. This text provides a short description of what the HW part of the Cryptech HSM contains, the design style used, and what would have to change in order to implement the HW part in an ASIC.

General digital functions and internal memories

The Cryptech digital functionality cores, such as the SHA-256 core, are written in generic RTL (Register Transfer Level) Verilog code. The code is written in a fairly conservative coding style and use language features from IEEE 1364-2001 (aka Verilog 2001).

All RTL code is divided into modules that contain one process for register updates and reset (reg_update), one or more combinational processes for datapath and support logic such as counters. Finally if needed, each module has a separate process that implements the logic for the final state machine that controls the behaviour of the module.

All cores are divided into a core, for example sha256_core.v and a number of submodules the core instantiates. The core provides raw, wide ports (256 bit wide key for AES for example) that is not suitable to use in a stand alone system. Instead each core comes with a top level wrapper, for example sha256.v. This top level wrapper contains all registers and logic needed to provide all functionality of the core via a simple 32-bit memory like interface. If the core is going to be used as a tightly integrated submodule, the wrapper can be discarded. Similarly, if the core is going to be used in a bus system that use a specific bus standard such as AMBA AHB, CoreConnect or WISHBONE, only the top level wrapper will be needed to be replaced or modified to match the desired bus standard.

The RTL code does not explicitly instantiate any hard macros such as memories, multipliers, etc. Instead all such functions are left to the synthesis tool to infer based on the code. All memories are placed in separate modules to allow easy modification of the design. In an ASIC setting any memories not automatically mapped will be replaced by instantiation of specific macros.

Some of the memories in the designs have combinational read (i.e the read data is not locked by an output register, which infers a one cycle read latency). For some FPGA technologies these memories are not compatible with the available physical memories. The synthesis tools therefor implement these memories using separate registers rather than selecting a memory instance. In an ASIC implementation these memories would likely become real memory macros to allow for a faster and more compact implementation.

Interfaces

External interfaces such as GPIO, Ethernet GMII, UART, etc., will always require some modification for the Cryptech design to be implemented in a given technology, whether it is a specific FPGA type or an ASIC. The important thing is that the Cryptech design does not use technology specific macros to implement the interfaces. But pin assignments, timing, and electrical requirements will always require adjustment and work.

Clocking and reset

The design style used in the Cryptech Verilog code currently follows the guidelines from the FPGA vendors Altera and Xilinx. This means that we use synchronous reset. For an ASIC implementation this will also work, even though asynchronous reset is far more common in ASIC designs. Changing to asynchronous reset is not a very big undertaking however, as the register reset and update clocking are separated into easily identifiable processes (reg_update) in the modules.

Most if not all registers in the Cryptech Verilog code have a defined reset state. Most registers also have a write enable signal that controls the update. This corresponds well with the registers available in FPGA technologies from Altera and Xilinx and their recommended design strategy from the vendors. This is also in line with common and good design styles for ASICs, which allows for compact code and low power implementations. The design is currently not use any clock gating. In future revisions this might be added if power consumption needs to be reduced and does not add side channel issues.

External memories

The Cryptech hardware design will use external persistent memories for protected key storage as well as external SRAM for protected master key storage. In an ASIC implementation the master key memory would probably be integrated to further enhance security.

Just like other external interfaces (see above), the interfaces for the external memories do not use any explicitly instantiated hard macros in the FPGAs.

Entropy sources

The current Cryptech design contains two separate physical entropy sources.

1: An avalanche noise based entropy source placed outside the FPGA. The entropy source signal is sampled by the FPGA using a flank detection mechanism.

An ASIC implementation would be able to use the external entropy source just like the FPGA. Furthermore, depending on the process options, it might be possible to have an internal avalanche diode based on ESD structures commonly used in I/O pin implementations. In a power management capable process, functionality available in step-up converters might also be possible to use as internal avalanche noise source.

Note that integrating the avalanche noise source does not mean that an off-chip noise source is excluded. The Cryptech RNG is modular and having both an internal and an external avalanche noise source is quite possible.

2: A ring oscillator based entropy source placed inside the FPGA. The ring oscillator used in the FPGA is based on carry chain feedback through adders. An ASIC implementation of this ring oscillator should work and produce noise with similar characteristics. However the specific circuit will have to be characterized with explicit layout and qualified for the given process.

Toolchain

Crypech currently use Verilog simulators for functional verification and commercial FPGA tools for implementation including time analysis.

An ASIC implementation will require several new tools including tools for synthesis, place & route and static time analysis that is acceptable as sign-off tool by the chip process vendor.

Conclusions

The HW designed for the first iteration of Cryptech is not specifically designed for FPGA implementation, but is in fact designed in a generic way to allow for easy implementation using different technologies such as ASICs.

There are however parts of the design that will have to be updated or modified in order to create a good ASIC implementation. The Cryptech project is confident that we know what those parts are and what they would entail.

Developing an ASIC will however require new tools which will incur costs.

Issues of an Assured Tool-Chain

2016-12-15T22:44:00+00:00

We do not have any assurance that our basic tools are not compromised.

Compilers
Operating Systems
Hardware Platforms
Verilog and Other Tools to Produce Chips

At the base, is the compiler. The fear was first formally expressed in Ken Thompson's 1984 Turing Award Lecture Reflections on Trusting Trust.

David A. Wheeler's PhD thesis, Fully Countering Trusting Trust through Diverse Double-Compiling outlines how we might deal with the compiler trust conundrum.