hashes: split Hash into GeneralHash and Hash#2910
hashes: split Hash into GeneralHash and Hash#2910apoelstra merged 8 commits intorust-bitcoin:masterfrom
Hash into GeneralHash and Hash#2910Conversation
Pull Request Test Coverage Report for Build 9646113412Details
💛 - Coveralls |
51ae63a to
aed525d
Compare
Pull Request Test Coverage Report for Build 9646393134Details
💛 - Coveralls |
This commit illustrates the transformation I intend to make everywhere we use newtyped hashes as "general hashes". *Within the module that the newtype is defined* I encapsulate engine calls, which I do by calling engine methods on the underlying general hash function. So within the module there is a slight reduction in type safety, in the sense that I need to make sure that I'm wrapping stuff properly. But outside of the module, there will be no difference except that I will no longer export engine/from_engine/hash/etc on newtyped hashes. Instead callers will need to compute the newtyped hash only in ways supported by the API. In theory we could have a macro to produce engine/from_engine/etc for newtypes that want to act as general hashes. But AFAICT there is no use case for this. Alternately, we could have a macro that produces *private* Engine types and private engine/from_engine/etc methods for the hashes, which could be used within the module and would provide stronger type safety within the module. But in practice, raw hashing is usually only used within a couple of methods, so all this infrastructure is way overkill and will just make maintenance harder for everybody.
In the next commits we are going to stop exposing the ability to hash arbitrary data into wrapped hash types like Txid etc. In preparation for this, stop using these methods internally. This makes our internal code a little bit uglier and less DRY. An alternative approach would be to implement the from_engine and engine methods, but privately (and maybe having a macro to provide this). But I think this approach is more straightforward. The one exception is for the Taproot hashes, which are tagged hashes and currently do not have their own engine type. I will address these in a later PR because this one is already too big.
This is a continuation of the previous commit, but separated to make review a little easier. This one replaces test vectors that were previously computed by hashing garbage into Txids and various other hash types with new test vectors which are directly-computed garbage converted to hashes with from_byte_array. In one case (src/hash_types.rs) this results in changing a bunch of fixed test vectors. This is okay; this test is supposed to check the direction of string serialization, which is unaffected by this commit (or any commit in this PR). The existing test vectors, because they hash the empty string, result in different bytes depending on the underlying hash algo (sha256, sha256d, sha256t, etc). The new ones just use the same fixed test vector for all of them. This commit also updates a doctest in crypto/sighash.rs which demonstrates how to manually feed sighash data into a hash engine and correctly handle the sighash single bug. Because you can no longer directly get a sighash object from an engine, this particular example should maybe be rewritten to just encode to a Vec rather than a hash engine, explaining that maybe you'd do this when implementing a HWW, to verify the exact data being hashed. Or something. Unrelatedly, you can check that there are no API changes in this commit or the last several. The next commit will remove GeneralHash impls and that's when you'll see changes.
…in hash_newtype We manually implement these methods (and the GeneralHash trait) on newtypes around sha256t::Hash, because tagged hashes require a bit more work. In the next commit (API diff) you will see that this affects two hashes, which are the only things that appear green in the diff. Users who want to implement their own engine/from_engine types now need to do it on their own. We do this for the non-Taproot sighash types in `bitcoin` (though only privately) to demonstrate that it's possible.
There are a few green lines for the Tap*Hash types, which are tagged hashes, and whose engine/from_engine impls are replaced. The rest is red, for the other hashtypes where these methods are just dropped. A later PR will also drop the methods for the Tap*Hash types but that seemed different enough from the rest to warrant its own PR.
aed525d to
8bd5c64
Compare
Pull Request Test Coverage Report for Build 9646767973Details
💛 - Coveralls |
|
Ok, ready to review. I had to do the API update, and in doing so, I noticed that there was some unfinished work with the Taproot hashes (which I decided to defer to a later PR, but I did update the commit messages in this one accordingly). |
|
I agreed with your view that eventually Thanks for picking this up, my sanity is much better preserved from reviewing this instead of coding it :) |
|
Let's split the traits, add inherent methods, and get sha256t working properly, and then explore dropping |
|
Yeah, I don't like the name much but I can agree with followup rename.
Not sure if entirely relevant but in primitives PR I've noted that we should be able to depend on
In a perfect world, I agree. If |
| + convert::AsRef<[u8]> | ||
| { | ||
| /// The byte array that represents the hash internally. | ||
| type Bytes: hex::FromHex + Copy; |
There was a problem hiding this comment.
Unrelated but I've just realized we can have a sealed IsByteArray trait that's implemented for [u8; N]. Maybe we can even enforce that the lengths match (even if we can't use LEN directly).
There was a problem hiding this comment.
We could, yeah, though I'm not sure it's necessary. And possibly some future implementor will want to use ArrayVec or something for some reason.
There was a problem hiding this comment.
As a lint and possibly preparation for the future where we just force it so people don't have to worry about it at all. I doubt usefulness of anything other than array. There's a fixed LEN constant for a reason. We could say "hash is whatever byte slice" but that is silly.
hashes/src/util.rs
Outdated
| fn from_byte_array(bytes: Self::Bytes) -> Self { Self::from_byte_array(bytes) } | ||
| } | ||
|
|
||
| // To be dropped in the next commit |
There was a problem hiding this comment.
Nice diff-minimizing idea.
|
|
||
| fn check_result(engine: sha256::HashEngine) { | ||
| let hash = TestType::from_engine(engine); | ||
| let hash = TestType(sha256::Hash::from_engine(engine)); |
There was a problem hiding this comment.
Yes, this is how I wanted it for the exact reason you stated in the commit message. However I also had .finalize method taking advantage of inference so you could just write TestType(engine.finalize()). I don't insist on this but it'd be nice to have eventually.
There was a problem hiding this comment.
Like, sha256::Engine::finalize could return an arbitrary From<sha256::Hash>?
There was a problem hiding this comment.
Oh, actually the comment about inference isn't even relevant. Engine implies specific hash type.
| header_data[0..32].copy_from_slice(&self[..]); | ||
| header_data[32..64].copy_from_slice(&previous_filter_header[..]); | ||
| FilterHeader::hash(&header_data) | ||
| FilterHeader(sha256d::Hash::hash(&header_data)) |
There was a problem hiding this comment.
Unrelated but it'd be better to just feed the data into engine rather than copying arrays around.
| } | ||
| Self::from_engine(engine) | ||
| } | ||
| } |
There was a problem hiding this comment.
I guess we could just make these (optionally) private in a subsequent PR.
I'm not thrilled with these names. Personally I would prefer having
ByteArrayWrapper(for non-general hashes) andHash(for general hashes). But usingHashandGeneralHashgreatly minimizes the diff because most of our use of theHashtrait was only for non-general stuff.Maybe that tradeoff will change as we move stuff to inherents? I hope to do that in the next PR (or maybe the one after that, since I still have some "drop
GeneralHashwork to do for tagged hashes). And after that the hashing API should be "clean" enough that we can figure out HashEngine, possibly even folding GeneralHash into it. But that's the part of #2770 that we didn't finish nailing down so I'm not sure.But other than naming, I like this PR. I think, if you approve of this PR except the naming, it would be best to ACK it and then we can do a followup rename-only PR, rather than dealing with the review pain of mass-renaming in rebases.