Skip to content

crypto: Use a different library for ed25519/sr25519#6526

Merged
mergify[bot] merged 6 commits intotendermint:masterfrom
oasisprotocol:yawning/wip/curve25519-voi
Jun 26, 2021
Merged

crypto: Use a different library for ed25519/sr25519#6526
mergify[bot] merged 6 commits intotendermint:masterfrom
oasisprotocol:yawning/wip/curve25519-voi

Conversation

@Yawning
Copy link
Contributor

@Yawning Yawning commented Jun 2, 2021

At Oasis we have spend some time writing a new Ed25519/X25519/sr25519 implementation called curve25519-voi. This PR switches the import from ed25519consensus/go-schnorrkel, which should lead to performance gains on most systems.

Summary of changes:

  • curve25519-voi is now used for Ed25519 operations, following the existing ZIP-215 semantics.
  • curve25519-voi's public key cache is enabled (hardcoded size of 4096 entries, should be tuned, see the code comment) to accelerate repeated Ed25519 verification with the same public key(s).
  • (BREAKING) curve25519-voi is now used for sr25519 operations. This is a breaking change as the current sr25519 support does something decidedly non-standard when going from a MiniSecretKey to a SecretKey and or PublicKey (The expansion routine is called twice). While I believe the new behavior (that expands once and only once) to be more "correct", this changes the semantics as implemented.
  • curve25519-voi is now used for merlin since the included STROBE implementation produces much less garbage on the heap.

Side issues fixed:

  • The version of go-schnorrkel that is currently imported by tendermint has a badly broken batch verification implementation. Upstream has fixed the issue after I reported it, so the version should be bumped in the interim.

Open design questions/issues:

  • As noted, the public key cache size should be tuned. It is currently backed by a trivial thread-safe LRU cache, which is not scan-resistant, but replacing it with something better is a matter of implementing an interface.
  • As far as I can tell, the only reason why serial verification on batch failure is necessary is to provide more detailed error messages (that are only used in some unit tests). If you trust the batch verification to be consistent with serial verification then the fallback can be eliminated entirely (the BatchVerifier provided by the new library supports an option that omits the fallback if this is chosen as the way forward).
  • curve25519-voi's sr25519 support could use more optimization and more eyes on the code. The algorithm unfortunately is woefully under-specified, and the implementation was done primarily because I got really sad when I actually looked at go-schnorrkel, and we do not use the algorithm at this time.

@lgtm-com
Copy link

lgtm-com bot commented Jun 2, 2021

This pull request fixes 1 alert when merging cf56be3 into 96db0ae - view on LGTM.com

fixed alerts:

  • 1 for Use of insufficient randomness as the key of a cryptographic algorithm

@Yawning Yawning force-pushed the yawning/wip/curve25519-voi branch from cf56be3 to aaa66e0 Compare June 2, 2021 11:25
@lgtm-com
Copy link

lgtm-com bot commented Jun 2, 2021

This pull request fixes 1 alert when merging aaa66e0 into 1f46a4c - view on LGTM.com

fixed alerts:

  • 1 for Use of insufficient randomness as the key of a cryptographic algorithm

@ValarDragon
Copy link
Contributor

Very excited for these to land! I feel like it would be nice to not have state machine tx pubkeys have risk of being cached / throwing out validator pubkeys. I don't think this should block merge, as current cosmos chains don't use Ed25519 for user signatures. One way this could be changed is by making a separate CachedVerify and Verify methods, or by making a cached pubkey struct, and have it managed at that layer, instead of managed via a hashmap on the back-end.

I'll try to review the cryptography during the week.

@tac0turtle
Copy link
Contributor

I feel like it would be nice to not have state machine tx pubkeys have risk of being cached / throwing out validator pubkeys. I don't think this should block merge, as current cosmos chains don't use Ed25519 for user signatures

Cosmos-sdk has their own version of ed25519. We now look at the the Tendermint crypto package as solely tendermint related

@Yawning
Copy link
Contributor Author

Yawning commented Jun 3, 2021

With the way I integrated support for the cache (lookup done at verification time), the overhead of a miss isn't terrible. The extra computational overhead imposed by the caching layer is confined to synchronization (in the form of two tiny critical sections), book keeping, and allocating the cache entry, everything else is (mostly) work that has to happen to verify anyway. It does also result in slightly more work for the GC since intermediaries that were stack allocated are now heap allocated.

In comparison to not using the cache at all, a single-verification will take a penalty of ~3.6 usec on my laptop, which while not ideal isn't what I consider to be the end of the world.

If this was to be more optimized, replacing the cache implementation with something tailored to the sort of access patterns that tendermint is expected to see should be simple (eg: have a cache that has a split backing store, with an LRU cache and a separate map, the latter which is manually managed populated on validator set changes).

@ValarDragon
Copy link
Contributor

Oh thats not bad at all for cache miss times, seems good to me then

Cosmos-sdk has their own version of ed25519. We now look at the the Tendermint crypto package as solely tendermint related

Ah thanks, didn't know this

// cacheSize is the number of public keys that will be cached in
// an expanded format for repeated signature verification.
//
// TODO/perf: Either this should exclude single verification, or be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have an artificial limit of 10.000 validators (https://github.com/tendermint/tendermint/blob/HEAD/types/vote_set.go?q=maxvalset#L18). Should we open an issue with this? I don't think this will come up for a while because there are other limitations that would prevent 4096 validators.

Copy link
Contributor

@tac0turtle tac0turtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK. Would love another approval before approving

Copy link
Contributor

@cmwaters cmwaters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have next to zero knowledge (😉 ) on crypto primitives so we might want a second pair of eyes on that but as to the verify functions - it looks all good to me.

@Yawning
Copy link
Contributor Author

Yawning commented Jun 8, 2021

Just as a heads up, when I do the final rebase, I would like to bump the import to include oasisprotocol/curve25519-voi#63.

The current STROBE implementation that is the same as the one pulled in by the existing merlin import causes most sr25519 operations to do horrific things to the heap, which I solved by writing a "from the specification" implementation of STROBE. Details in the PR.

@Yawning Yawning force-pushed the yawning/wip/curve25519-voi branch 2 times, most recently from 3979717 to 8710280 Compare June 9, 2021 09:04
@codecov
Copy link

codecov bot commented Jun 9, 2021

Codecov Report

Merging #6526 (744d486) into master (167fa73) will increase coverage by 0.00%.
The diff coverage is 58.67%.

@@           Coverage Diff           @@
##           master    #6526   +/-   ##
=======================================
  Coverage   60.97%   60.98%           
=======================================
  Files         297      297           
  Lines       28030    28043   +13     
=======================================
+ Hits        17092    17101    +9     
+ Misses       9223     9222    -1     
- Partials     1715     1720    +5     
Impacted Files Coverage Δ
crypto/crypto.go 0.00% <ø> (ø)
crypto/sr25519/encoding.go 100.00% <ø> (ø)
crypto/sr25519/pubkey.go 23.52% <28.57%> (-33.00%) ⬇️
crypto/sr25519/privkey.go 44.64% <48.97%> (-6.52%) ⬇️
crypto/sr25519/batch.go 57.14% <57.14%> (-26.20%) ⬇️
crypto/ed25519/ed25519.go 44.89% <66.66%> (-0.76%) ⬇️
types/validation.go 80.62% <83.33%> (-4.57%) ⬇️
internal/p2p/conn/secret_connection.go 83.93% <100.00%> (-0.09%) ⬇️
privval/secret_connection.go 76.16% <100.00%> (-0.13%) ⬇️
types/tx.go 82.97% <0.00%> (-4.26%) ⬇️
... and 12 more

@Yawning Yawning force-pushed the yawning/wip/curve25519-voi branch from 8710280 to f415212 Compare June 9, 2021 09:13
@Yawning
Copy link
Contributor Author

Yawning commented Jun 9, 2021

Ok, I went ahead and rebased the branch, and bumped the import to pickup the STROBE changes. As an added bonus I exposed the merlin implementation (forked from George Tankersley's) that I initially left as an internal package and switched the import there as well.

It is API incompatible because ExtractBytes takes a destination buffer, and labels are strings, but I view both changes as improvements since they both make the code nicer to read.

@lgtm-com
Copy link

lgtm-com bot commented Jun 9, 2021

This pull request fixes 1 alert when merging f415212 into d6b4bc2 - view on LGTM.com

fixed alerts:

  • 1 for Use of insufficient randomness as the key of a cryptographic algorithm

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale for use by stalebot label Jun 20, 2021
@tac0turtle
Copy link
Contributor

@Yawning sorry for the delay. Could you resolve the conflicts and we can get to merging

@Yawning Yawning force-pushed the yawning/wip/curve25519-voi branch from f415212 to 7ee89d4 Compare June 25, 2021 08:58
@tac0turtle
Copy link
Contributor

oh whoops, also a changelog entry. Sorry, just noticed this.

Can it mention the breaking changes as well?

cc @sunnya97 we are breaking sr25519 in 0.35. Do you need anything from us for straight edge.

@Yawning Yawning force-pushed the yawning/wip/curve25519-voi branch 2 times, most recently from 0d778c4 to 7a6e3a0 Compare June 25, 2021 09:24
@Yawning
Copy link
Contributor Author

Yawning commented Jun 25, 2021

oh whoops, also a changelog entry. Sorry, just noticed this.

Can it mention the breaking changes as well?

This should be done now.

cc @sunnya97 we are breaking sr25519 in 0.35. Do you need anything from us for straight edge.

If they do not use sr25519.GenPrivKeyFromSecret things might be "ok", since I kept the byte-serialized PrivKey as a sr25519 MiniSecretKey. I would still recommend existing keys that were generated with that routine to be regenerated because the extra expansion removes 3-bits of the scalar.

@Yawning Yawning force-pushed the yawning/wip/curve25519-voi branch from 7a6e3a0 to 1b5655f Compare June 25, 2021 09:34
@lgtm-com
Copy link

lgtm-com bot commented Jun 25, 2021

This pull request fixes 1 alert when merging 1b5655f into 917180d - view on LGTM.com

fixed alerts:

  • 1 for Use of insufficient randomness as the key of a cryptographic algorithm

@tac0turtle tac0turtle added the S:automerge Automatically merge PR when requirements pass label Jun 25, 2021
@tac0turtle
Copy link
Contributor

@Yawning sorry to bother again. Can you update to master again, then the bot should be able to merge this.

Yawning added 5 commits June 26, 2021 03:54
The old benchmark measured:
 * `sigsCount` GenPrivKey + Sign
 * `sigsCount` BatchVerifier.Add
 * `b.N/sigsCount` BatchVerifier.Verify (majority of the runtime).

This is all sorts of nonsensical, especially give that there can be a
non-trivial amount of overhead in `BatchVerifier.Add`.

The rewritten benchmark measures:
 * `b.N/sigsCount` NewBatchVerifier
 * `b.N/sigsCount * sigsCount` BatchVerifier.Add
 * `b.N/sigsCount` BatchVerifier.Verify

This is far more sensible as it better reflects the per-signature
combined cost of instantiating the batch verifier, adding a signature
to the batch, and the verify.
Having to redo quite a bit of computation in the event of a batch
failure if the caller is actually interested in which signature failed,
is rather sub-optimal.  Change the batch verification interface to
expose a one-shot batch verify + fallback call, so that sensible
libraries can handle this case faster.

This commit also leverages the failure information so that validation
will only ever call one of `verifyCommitBatch` or `verifyCommitSingle`.
Switch the sr25519 implementation to curve25519-voi for better
performance, and code quality.

Performance comparisions should still be taken with a grain of salt
for the following reasons:

 * curve25519-voi's sr25519 support can use more optimization.
 * go-schnorrkel cuts corners in places by:
   * Not doing delinearization at all when verifying batches
   * Not using the secret key nonce at all when signing.
   * Not sampling random scalars at all when verifying batches,
     unless the import is bumped.

WARNING: This is a breaking change as the original tendermint sr25519
support expands the MiniSecretKey twice, while this implementation
only does it once.
@Yawning Yawning force-pushed the yawning/wip/curve25519-voi branch from 1b5655f to c5e0263 Compare June 26, 2021 03:55
@lgtm-com
Copy link

lgtm-com bot commented Jun 26, 2021

This pull request fixes 1 alert when merging c5e0263 into 167fa73 - view on LGTM.com

fixed alerts:

  • 1 for Use of insufficient randomness as the key of a cryptographic algorithm

Since I had to fork gtank's to add a transcript RNG, and ended up
rewriting the STROBE implementation for my fork, the code might as well
use it for every use of merlin as there are a number of improvements,
the most notable being a dramatic reduction in the number of allocations
done for the various STROBE calls.
@Yawning Yawning force-pushed the yawning/wip/curve25519-voi branch from c5e0263 to 744d486 Compare June 26, 2021 16:44
@mergify mergify bot merged commit c5cc3c8 into tendermint:master Jun 26, 2021
@lgtm-com
Copy link

lgtm-com bot commented Jun 26, 2021

This pull request fixes 1 alert when merging 744d486 into 167fa73 - view on LGTM.com

fixed alerts:

  • 1 for Use of insufficient randomness as the key of a cryptographic algorithm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C:crypto Component: Crypto S:automerge Automatically merge PR when requirements pass T:enhancement Type: Enhancement T:perf Type: Performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants